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Absh-nct 

The  Generalized  Lambda  Distribution  (GLD)  is  a  four  parameter  function  that  is  capable  of 
mimicking  the  behavior  of  a  wide  range  of  probability  density  functions  (pdfs).  Unfortunately,  the 
GLD  presently  cannot  model  every  pos.sible  type  of  pdf.  Since  the  reasons  for  this  limitation  arc 
unknown,  this  thesis  e.xamines  several  potential  problems  in  an  attempt  to  expand  the  range  of 
distributions  the  GLD  can  mimic. 

We  first  present  a  discussion  of  the  behavior  of  the  algorithm  (known  as  Powell’s  Algorithm) 
that  is  used  to  search  for  the  appropriate  GLD  parameter  values.  In  particular,  we  examine  the 
effect  of  using  this  unconstrained  search  procedure  to  find  the  parameters  subject  to  a  constraint 
that  ensures  that  the  resulting  pdf  is  valid.  We  also  develop  a  reparameterization  of  the  GLD  that 
creates  an  unconstrained  search  region.  This  does  not  expand  the  range  of  distributions  the  GLD 
can  mimic, 

We  then  use  an  extensive  numerical  investigation  to  examine  the  set  of  distributions  that 
can  be  obtained  from  combinations  of  the  GLD  parameters.  This  examination  allows  us  to  expand 
the  range  of  pdfs  that  can  be  modeled  using  the  GLD.  We  also  inspect  some  pdfs  that  cannot  be 
modeled  using  the  GLD,  as  well  as  present  an  alternative  to  the  method  of  moments  for  determining 
the  parameter  values,  using  the  recently-developed  concept  of  L-moments. 


AN  ANALYSIS  OF  THE 
GENERALIZED  LAMDDA  DISTRIBUTION 

/.  Introduction 

Today’s  military  is  faced  with  a  two-fold  problem.  Because  of  the  end  of  the  Cold  War  and 
its  resulting  budgetary  reduction.s,  it  must  reduce  its  manpower  in  all  career  fields.  In  addition, 
since  there  is  no  longer  a  clearly-defined  “enemy,’’  it  must  also  be  prepared  to  deal  with  a  myriad 
of  potential  conflicts  and  crises.  For  example,  the  military  support  for  Operation  Restore  Hope,  the 
1992-93  relief  efforts  in  Somalia,  had  to  be  thoroughly  planned  and  organized.  Military  planners 
were  faced  with  the  monumental  task  of  orchestrating  the  movement  of  over  28,000  Marine  and 
Army  troops,  as  well  as  providing  for  their  logistical  support  in  a  country  that  had  no  established 
government,  infrastructure  or  means  of  feeding  its  own  people. 

Because  of  budget  cuts,  the  planning  involved  in  dealing  with  such  situations  must  be  accom¬ 
plished  by  a  smaller  staff,  in  the  same  amount  of  time  as  before.  Planners  are  forced  to  do  more 
with  less  and  must  therefore  work  more  efficiently.  Any  cost-  or  time-cutting  measures  that  are 
available  must  be  implemented. 

One  such  measure  is  the  use  of  computer-aided  simulations.  As  a  result  of  the  rapid  improve¬ 
ment  in  the  capabilities  of  small  computer  systems,  computer  simulations  that  model  changing, 
complex,  or  unique  situations  have  become  more  practical  and  easier  to  implement.  Planning  for 
contingencies  such  as  Operation  Restore  Hope  can  be  done  in  a  fraction  of  the  time,  and  the  soft¬ 
ware  used  to  analyze  one  scenario  can  be  saved  to  model  any  other  similar  world  situation  that 
may  arise  in  the  future. 

Simulation  is  useful  in  many  military  situations.  It  can  be  used  to  assess  potential  casualty 
rates  and  mission  timetables  for  large-scale  military  operations,  such  as  Desert  Storm,  the  1992 
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Gulf  W.if.  It  can  also  be  used  to  examine  situ.ations  tha*.  are  difficult  to  model  in  the  real  world, 
such  as  the  expected  damage  resulting  from  an  ICHM  strike. 

The  problem  with  simulation,  however,  is  that  many  apjilications  require  one  to  develop 
probabilistic  models  of  various  input  variables,  usually  based  on  real-svorld  data.  For  the.se  simu¬ 
lations,  information  such  as  the  lifetime  of  a  certain  part,  the  number  of  troops  needed  to  secure 
an  airport,  or  the  radiation  pattern  from  a  nuclear  explosion  is  crucial  in  order  to  get  results  that 
“mean  something.”  However,  such  information  is  rarely  known  with  certainty.  Instead,  planners 
usually  have  a  bank  of  accumulated  data  from  previous  tests  and  exiieriences  which  they  can  use 
to  probabilistically  describe  the  range  of  possible  values  for  each  of  their  variables  of  interest. 

The  probabilistic  description  of  a  continuous  random  variable  is  usually  summarized  by  its 
probability  density  function  (pdf).  Many  different  pdfs,  including  those  for  such  well-known  dis¬ 
tributions  as  the  Normal,  Exponential,  and  Weibull,  are  available  to  the  modeler.  Each  pdf  has 
its  own  distinct  shape  (or  shapes)  and  covers  a  specific  range  of  values.  Since  most  of  these  pdfs 
are  available  in  commercial  simulation  packages,  the  modeler  must  decide  which  of  the  various 
distributions  that  are  available  best  “fits,”  or  describes,  his  set  of  data. 

An  alternative  to  this  decision  is  to  use  a  distribution  with  a  more  generalized  density  function 
that  is  not  limited  to  a  small  number  of  particular  shapes,  but  can  instead  take  on  a  wide  variety 
of  different  ones.  The  use  of  such  a  generalized  function  frees  the  modeler  from  having  to  decide 
between  two  competing  distributions  and  allows  hin;  the  leeway  to  create  an  even  better  fit  to  the 
data. 

The  Generalized  Lambda  Distribution  (GLD)  has  just  such  a  density  function.  It  was  origi¬ 
nally  created  by  Ramberg  and  Schmeiser  [12]  for  the  purpose  of  efficiently  r  odeling  and  generating 
random  variables  for  use  in  simulation  studies.  The  GLD  is  comprised  of  four  parameters,  Ai,  At, 
A3,  and  A4,  that  act  together  to  allow  adjustment  of  the  location,  scale,  and  shape  of  the  density 
function  so  that  it  is  able  to  model  different  random  variables.  In  order  to  fit  the  GLD  to  an 
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actual  set  of  data,  we  need  to  determine  appropriate  values  for  the  four  parametiTs.  This  is  usu- 
all  y  accomplished  by  first  computing  the  first  four  sample  moments  (the  mean,  variance,  skewness, 
and  kurtosis)  of  the  data.  We  then  use  a  computerized  searcli  routine  to  find  the  appropriate 
combination  of  [larameter  values  so  that  the  resulting  form  of  the  ftIT)  will  have  those  same  four 
moments. 

Because  of  its  ability  to  “match  moments,”  we  can  use  the  G1,I)  to  mimic  the  behavior 
of  most  of  the  cornnionly-used  pdfs  simply  by  setting  the  four  paraint'ters  to  appropriate  values. 
Figure  1  shows  the  ranges  of  these  pdfs  in  terms  of  their  measures  of  skewness  and  kurtosis.  Some 


Figure  1.  Characterization  of  Various  Distributions  by  Their  Skewness  and  Kurtosis 

distributions,  such  as  the  Uniform,  Normal  and  Exponential,  are  represented  by  single  points; 
others,  such  as  the  Student’s  t,  Log-Normal  and  Gamma,  are  represented  by  curves;  the  three  types 
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of  beta  distributions  are  represented  by  regions  of  values.  The  top  riglit  -hand  region,  denoted  tlie 
‘  Iir.possible  Area,"  contains  skewness-  kurtosis  combinations  that  will  never  be  exhibited  by  any 
pdf.  The  GLH,  at  present,  can  “mimic"  those  pdfs  found  within  the  shaded  region  of  Figure  1. 
Obviously,  all  but  some  of  the  U  -  and  J-shaped  beta  distributions  can  currenily  be  modeled  using 
the  GLD. 

In  order  to  make  the  GI/D  a  more  effective  tool,  we  wish  to  expand  the  shaded  region  of 
Figure  1  beyond  its  current  limits — ideally,  all  the  way  to  the  Impossible  Area.  Unfortunately,  the 
reasons  for  the  GLD’s  current  limitation  are  unknown,  and  one  of  the  objectives  of  this  thesis  is  to 
explore  these  limitations  and  develop  means  to  overcome  them. 

There  are  three  particular  questions  we  can  addre.ss  in  pursuit  of  this  objective.  One  concerns 
the  behavior  of  the  computerized  search  routine  (known  as  Powell’s  Algorithm)  used  in  the  moment¬ 
matching  process.  This  algorithm  was  originally  designed  for  searchers  over  unrestricted  regions. 
However,  in  the  ca.se  of  the  GLD,  two  of  the  parameters,  A3  and  A4,  are  required  to  have  the 
same  sign,  thereby  limiting  the  range  of  possible  values.  We  are  uncertain  of  the  behavior  of  our 
particular  implementation  of  Powell’s  Algorithm  as  it  encounters  this  restricted  area.  In  order  to  get 
a  better  understanding  of  this  problem,  we  will  perform  an  examination  of  the  algorithm’s  behavior 
as  it  approaches  and  attempts  to  cross  the  constraining  boundaries.  Wc  will  also  examine  some 


techniques  for  eliminating  the  problem  altogether.  These  efforts  should  help  us  determine  whether 
the  observed  limitation  on  the  GLD  occurs  as  the  result  of  inadequacies  in  the  implementation  of 
this  search  algorithrri. 

Second,  previous  research  has  shown  that  a  pattern  exists  in  the  values  of  A3  and  A4  that  pro¬ 
duce  certain  combinations  of  skewness  and  kurtosis.  This  pattern  suggests  the  hypothesis  that  the 
minimum  possible  value  of  kurtosis  that  the  GLD  can  have — given  a  specified  value  of  skewness- — 
occurs  when  A3  =  0.  This  implies  that  the  GLD’s  limitation  might  be  a  result  of  restrictions  on 


the  possible  values  of  its  parameters.  As  a  means  of  assessing  the  validity  of  this  hypothesis,  we 
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will  examine  the  skewness  and  kurtosis  values  that  result  from  different  combinations  of  A3  and 
Xa.  We  hope  that  the  insights  garnered  from  this  exercise  will  also  enable  us  to  determine  if  the 
limitations  on  the  GLD  can  be  overcome. 

Third,  we  will  examine  some  of  the  distributions  that  can  not  be  modeled  by  the  GLD. 
Although  these  cases  may  be  important  in  mathematical  circles,  it  is  conjectured  that  they  may 
not  have  significant  practical  applications.  Further  examination  will  help  determine  whether  this 
is  true  or  not. 

As  discussed  previously,  the  GLD  parameter  values  are  traditionally  estimated  from  the  first 
four  moments  of  the  data.  However,  the  higher  order  sample  moments  (skewness  and  kurtosis)  are 
highly  variable  and  are,  therefore,  somewhat  untru.stworthy.  Recent  research  has  suggested  that  a 
different  approach.  The  use  of  L-moments  could  be  used  to  replace  these  moments  with  potentially 
more-stable  measures.  Thus,  a  second  objective  of  this  thesis  is  to  develop  these  measures  for  the 
GLD  and  revise  the  computer  search  routines  to  utilize  this  new  method  as  an  alternative  to  the 
method  of  moments. 

Figure  1  shows  that  the  GLD  can  already  be  used  as  an  effective  tool  for  simulation  analyses. 
Hopefully  this  research  will  provide  a  deeper  insight  into  the  properties  and  limitations  of  the  GLD. 
By  conclusively  determining  the  limits  of  its  range,  as  well  as  the  types  of  distributions  that  can  not 
be  duplicated  using  it,  we  will  attempt  to  make  the  GLD  an  even  more  powerful  tool  and  further 
enhance  its  value  for  future  simulation  studies. 
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2.1  OLD  Development 


//.  Background 

Although  most  of  the  commonly-  used  continuous  probability  distributions  are  defined  in 
terms  of  their  density  functions,  f(x),  or  cumulative  distribution  functioir.  F(x),  it  is  equally  valid 
to  define  a  distribution  by  its  percentile  function,  if  that  percentile  function  exists.  The  percentile 
function  is  simply  the  inverse  of  the  distribution  function,  i.e.,  R(p)  =  or  equivalently, 

p  =  F{x).  The  percentile  function,  R{p),  is  used  similarly  to  a  distribution's  cumulative  density 
function  (cdf),  in  that  it  determines  the  value,  x  —  R(p),  such  that  the  probability  that  a  random 
variable  having  this  distribution  takes  a  value  less  than  x  is  p. 

The  ability  to  express  a  random  variable  in  terms  of  its  percentile  function  is  quite  useful  in 
Monte  Carlo  simulation  studies.  In  particular,  it  is  well  known  that  if  R  is  the  percentile  function 
of  a  continuous  probability  distribution  and  if  the  random  variable  U  is  uniformly  distributed  on 
the  (0,1)  interval,  then  the  transformation  X  =  R{U)  yields  a  continuous  random  variable  with 
percentile  function  R.  Thus,  since  sources  of  uniform  (0,1)  pseudo-random  variates  are  commonly 
available,  this  transformation  yields  a  simple  method  for  generating  pseudo-random  variates  from 
distributions  whose  percentile  functions  are  known  and  are  computationally  tractable. 

Tukey  [15]  created  a  function,  which  he  called  the  lambda  function,  in  this  manner.  Tukey’s 
function,  which  is  valid  for  all  non-zero  A,  can  be  written  as 

o<p<l.  (I) 

Filliben  [2]  used  this  function  to  approximate  symmetric  distributions  with  a  wide  range  of 
tailweights  and  noted  that  when  A  =  0.14,  the  lambda  function  resulted  in  a  “good”  approximation 
to  the  standard  normal  distribution.  He  further  noted  that  the  logistic  distribution  results  as 
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A  — *  0  while,  for  A  =  —1,  the  resulting  function  is  approximately  Cauchy,  Filliben  also  present<‘d  a 
complete  discussion  of  the  density  functions  that  result  for  various  values  of  A. 

Ramberg,  ei  al  [13]  generalized  Equation  1  to  a  four -parameter  distribution  that  could  be 
used  to  approximate  a  number  of  well-known  symmetric  and  asymmetric  distributions — noting,  in 
comparison,  that  a  close  approximation  to  the  standard  normal  results  when  Ai  =  0,  A2  =  0.1975, 
and  A3  =  A4  =  0.1350  [13:203],  Their  distribution  is  defined  by  the  percentile  function 


R{p)=)<i  +  ^ - 0<p<l.  (2) 

A2 

The  distribution  defined  by  Equation  2  is  referred  to  cu?  the  Generalized  Lambda  Distribution 
(GLD).  The  GLD  has  also  been  referred  to  as  the  Ramberg-Schmeiser-Tukey  (RST)  distribution 
in  the  literature  (see,  for  example,  Mykytka  [9]).  The  parameters  Aj  and  A2  are  location  and  scale 
parameters,  respectively,  while  A3  and  A4  are  shape  parameters  that  jointly  determine  the  skewness 
and  kurtosis  of  the  GLD.  When  A3  =  A4,  the  resulting  density  is  symmetric. 

Using  the  fact  that  x  =  F~*(p)  =  R(p),  we  can  find  the  density  function  corresponding  to 
Equation  2  by  noting 

X  _  <^F(x)  _  dp  _  /(jR(p)y* 

’  dx  dR(p)  \  dp  )  ' 

which  yields 


_ ^2 _ 

-b  A4(1  -p)-'*"* 


0  <  p  <  1. 


It  should  be  noted  that  although  Aj  does  not  appear  explicitly  in  this  expression,  f(x)  is  indeed  a 
function  of  Ai  since  it  is  defined  in  terms  of  R(p),  which  does  depend  on  Ai. 

The  cumulative  distribution  function  of  the  GLD  does  not,  in  general,  exist  in  a  simple  closed 
form,  but  this  should  not  be  a  cause  of  concern  since  it  is  also  true  of  the  normal  distribution. 
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whose  percentiles  are  more  difFicult  to  compute.  For  the  GLD,  it  is  simple  to  obtain  plots  of  the 
distribution  function  by  plotting  p  on  the  y-axis  versus  R(p)  on  the  x-axis.  Similarly,  a  plot  of 
the  density  function  is  obtained  by  plotting  f(R(p)]  on  the  y-axis  against  R(p)  on  the  x-axis,  for 
p  ranging  from  zero  to  one.  FORTRAN  programs  that  compute  R(p)  and  f[R(p)]  for  specified 
lambda  values  are  given  in  Mykytka  [9:82-84]. 

S.2  Calculation  of  GLD  Parameters 


2.2.1  Statistics  Background.  The  four  GLD  parameters  are  linked  to  the  distribution’s 
first  four  central  moments:  the  mean  (ft),  variance  skewness  (as),  and  kurtosis  {04).  For 
readers  unfamiliar  with  these  concepts,  this  subsection  will  present  a  brief  overview.  A  more 
thorough  explanation  of  these  concepts  can  be  found  in  most  statistics  textbooks.  The  information 
given  below  was  taken  from  Mendenhall,  Wackerly  and  Scheaffer  [8]. 

The  first  moment  about  the  origin  describes  the  center  of  a  pdf  and  is  commonly  referred  to 
as  the  mean  (fi).  The  variance  (<r^)  is  the  second  moment  about  the  mean  of  a  distribution  and 
describes  the  “spread”  if  its  pdf.  Unfortunately,  the  mean  and  variance  do  not  uniquely  define  a 
pdf.  Many  different  distributions  can  possess  the  same  mean  and  variance.  Therefore,  we  must 
utilize  additional  measures  (such  as  skewness  and  kurtosis)  to  distinguish  between  different  pdfs. 


The  reader  might  remember  that  for  a  particular  distribution,  the'“niean~ls^efin^d  as  the 
expected  value  of  a  random  variable  following  that  distribution,  fi  =  E{X).  This  is  the  average 
value  we  would  observe  if  we  were  to  continually  take  samples  from  the  distribution.  For  an 
empirical  data  set  with  n  observations,  we  can  estimate  fi  with  the  sample  mean,  X,  which  is 
defined  as  follows 


x  =  l'£Xf. 


1=1 


X  yields  the  average  value  of  the  data  set. 
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The  variance  gives  a  measure  of  how  wide  tlie  distribution  (or  sample  data  set)  is.  It  is  defined 
as  the  expected  value  of  the  square  of  the  difference  between  a  sample  value  and  the  mean  of  the 
distribution,  <t^  =  £'[(A'  —  For  sample  data,  this  can  be  estimated  using  the  sample  variance 


-  A  )^ 


i  =  l 

The  higher  order  moments  are  defined  similarly.  The  standardized  third  moment  about  the 
mean,  the  skewness  (03),  gives  a  measure  of  liow  symmetric  the  distribution  is.  It  is  defined  as 
the  scaled  expected  value  of  the  cube  of  the  difference  between  a  sample  value  and  the  mean  of 
the  distribution,  03  =  ^ ,  whete  the  term  in  the  denominator  is  a  scaling  factor  used  to 

make  03  a  dimensionless  measure.  For  sample  data  sets, 

i  _ 


03  = 


/t3 


If  a  distribution  is  symmetric  about  its  mean  (like  the  Normal  distribution,  for  example),  it  will 
have  a  skewness  of  zero. 

The  standardized  fourth  moment  about  the  mean,  the  kurtosis  (04),  is  a  measure  of  the 
“tailweight”  of  the  distribution.  It  can  be  roughly  thought  of  as  the  number  of  values  that  lie  in 
the  tails  of  a  distribution.  It  is  defined  as  the  scaled  expected  value  of  the  difference  between  a 
sample  value  and  the  mean,  taken  to  the  fourth  power,  04  =  We  again  use  a  scaling 

factor  (a*)  to  make  04  a  dimensionless  measure.  For  sample  data  sets,  04  can  be  estimated  using; 


04  = 


A  lower  value  of  04  signifies  that  the  distribution  will  have  “thinner”  tails  than  a  distribution  that 
possesses  a  larger  value  of  04. 
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As  mentioned  previously,  many  different  distributions  can  share  the  same  mean  and  variance. 
Fortunately,  a  distribution’s  measures  of  skewness  and  kurtosis  are  fairly  unique,  and  therefore  can 
be  used  to  distinguish  it  from  other  pdfs,  as  we  saw  in  Figure  1.  Although  we  can  only  uniquely 
define  a  distribution  using  an  infinite  number  of  its  moments,  we  will  find  that,  in  practice,  using 
just  these  first  four  moments  will  be  sufficient  for  most  purposes. 

These  four  moments  will  be  used  to  determine  values  for  the  four  parameters  of  the  GLD.  By 
setting  the  four  A  parameters  to  appropriate  values,  we  will  be  able  to  duplicate  nr.any  combinations 
of  mean,  variance,  skewness,  and  kurtosis. 

2.2.2  Parameier  Definitions.  Ramberg  and  Schmeiser  [12]  showed  that  for  A]  =  0,  the 
moment  of  the  GLD,  when  '*  exists,  is  given  by 

EiX^)  =  Aj*  0)  -  0  +  1,  A^ .  »•  +  1)  (4) 

•=o  '  ' 

where  the  beta  function  is  defined  (as  in  [1:332])  by 


P{x,y) 


r(^)r(y) 

r(x  +  y)  ■ 


Since  the  beta  function  is  undefined  whenever  either  of  its  arguments  are  negative,  we  must  assure 
both: 


AaC/t  -  i)  +  1  >  0 


and 


A4»+  1  >  0, 
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for  all  :  <  k.  Therefore,  the  k‘''  moment  does  not  exist  whenever 


min{X3,X^)  < 


(5) 


REGION  1 

X 

No  positive 
moments 
exist  . 


I 


REGION3(X3.?L4>0) 


all  positive  moments 
exist 


non-valid 

densities 


-t 


•1/2  ,-1/4 


■  first  four  moments  exist 


-1/4 


mean  and  variance  exist 


•1/2 


first  moment 
exists 


non-valid 

densities 


-1 


No  positive  integer 
moments  exist 


REGION4(A.3,X4>0) 


REGION  2 


(X^>1. 


<-l) 


No  positive 

moments 

exist 


Figure  2,  Regions  of  GLD  in  A3-A4  Space 


The  mathematical  definition  of  the  GLD  that  has  been  given  does  not  ensure  that  it  always 
defines  a  legitimate  probability  distribution.  A  valid  pdf  must  exhibit  the  following  behavior: 


fix)  >  0,  Va: 

fr^fix)dx  =  l. 
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Schmeisor  [14]  dptnrmined  that  there  are  four  regions  for  which  the  GLD  exhibits  such  behavior. 
These  regions  are  depicted  in  A3-A4  space  in  Figure  2,  and  have  been  arbitrarily  numbered  1,  2, 
3,  and  4.  It  can  be  shown  that  no  positive  moments  exist  in  Regions  1  or  2  (see,  for  example, 
[6:183-193]).  Likewise,  from  equation  5,  we  can  sec  that  the  first  four  moments  exist  in  Hegion  4 
only  when  both  A3  and  A4  are  greater  than  -i.  All  positive  moments  exist  in  Hegion  3. 

Using  Equation  4,  Ramberg  and  Schmeiser  developed  the  following  expressions  for  the  mean, 
variance,  skewness,  and  kurtosis  of  the  GLD: 


fi  =  /i(Ai,A2,  A3,  A4)  —  Ai  + — 


(C) 


=  <T"(A2,  A3,A4)  = 


Ao- 


(7) 


03  =  a^(A3,A4)  =  sijfn(A2) 


C  -  ZAB  +  2A3 
(B- 


(8) 


...  .  ,  D-4AC  +  &A^B-ZA* 

—  o(A3iA4)—  [B  —  A'^y 


(9) 


where  A,  B,  C,  and  D  are  the  following  functions  of  A3  and  A4: 


A 


B 


C 


1 

1  -I-  A3 

1 


I  +  2A3 

1 

l-b3A3 


1 

1  -b  A4 

-2/?(l-l-A3,H-A4)-b 

-  3/?(l  +  2A3, 1  -b  A4)  -I-  3/3(1  -b  A3, 1  +  2A4)  - 


1 

1  +  3A4 


(10) 

(11) 

(12) 
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D  =  ttW-  -  1  +  A4)  +  r>/?(i  +  2A3, 1  +  2A4) 

1  +  ^^3 

—40(1  +  A3, 1  +  4-  (13) 

1  +  4A4 

and  sign(X2)  will  be  either  1  or  —1,  depending  on  whether  the  A2  parameter  is  positive  or  negative. 

As  the  notation  indicates,  the  skewness  and  kurtosis  are  functions  of  A3  and  A4  alone.  The 
skewness,  03,  is  a  function  of  only  A3  and  A4  since,  in  Regions  3  and  4,  the  sign  of  A2  is  always 
the  same  as  the  sign  of  both  A3  and  A4  [9:20].  The  variance,  however,  also  depends  on  the  shape 
parameter  A2  and  the  mean  depends  on  all  four  parameters.  The  shaded  region  of  Figure  1  shows 
the  different  combinations  of  skewness  and  kurtosis  that  can  be  obtained  from  the  GbD. 


2.2.3  Techniques  for  Determining  Parameter  Values.  The  technique  knowm  as  the  method 
of  moments  is  the  usual  means  of  selecting  the  values  of  the  GLD  parameters.  By  choosing  the 
four  parameter  values  appropriately,  a  wide  range  of  distributions  can  be  duplicated,  as  indicated 
by  the  shaded  portion  of  Figure  1.  We  choose  the  parameter  values  as  follows: 


1.  Since  the  equations  for  the  skewness  and  kurtosis,  Equations  8  and  9,  are  functions  of  only 
A3  and  A4,  we  first  determine  the  values  of  A3  and  A4  that  “match”  the  desired  combination 
of  03  and  04. 

2.  Since  we  now  have  values  for  A3  and  A4,  we  use  Equation  7  to  find  a  value  for  A2  so  that  the 
GLD  has  the  desired  variance. 

3.  Equation  6  is  then  used  to  find  a  value  for  Aj  so  that  we  achieve  the  desired  mean  as  well. 

Table  1.  Parameter  Determination  Using  the  Method  of  Moments 


If  the  values  for  ft,  03,  and  04  are  unknown,  they  must  be  estimated  ifrom  the  sample  data. 
The  procedure  outlined  above  can  then  used  to  determine  the  four  lambda  p^ameters,  replacing 
ft,  as,  and  04  with  the  sample  statistics  A',  ofs,  and  ef4. 

It  should  be  noted  that  the  third  and  fourth  sample  moments,  ds,  and  04,  are  quite  sensitive 


to  extreme  observations  (values  of  A',-  located  more  than  two  standard  deviations  from  X)  and  the 
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variability  of  these  sample  moments  can  be  quite  large.  This  can,  in  turn,  result  in  a  poor,  or  even 
incorrect,  GLD  fit.  As  a  way  of  reducing  this  variability,  Ho.sking  [It]  presents  an  alternative  to  the 
use  of  sample  moments,  that  of  L-moments.  His  L-moments  are  computed  as  linear  combinations 
of  order  statistics.  When  properly  defined,  these  I-  moments  can  be  ns<‘d  in  place  of  the  traditional 
sample  moments.  He  has  shown  that  L-moments  produce  a  more  powerful  goodnesS-of-fit  test  of 
Normality  than  do  traditional  moments  [4].  We  will  consider  the  use  of  L-moments  in  selecting 
the  parameters  of  the  GLD  in  Chapter  VII. 

Ramberg  ei  al  [13:210-214]  provide  a  table  of  the  four  GLD  parameter  values  for  various 
combinations  of  skewness  and  kurtosis.  These  tables  only  cover  di.stributions  with  zero  mean  and 
unit  variance,  but  the  transformations 

Ai(0,  l)  (r-(-/i 

AiCO,  1) 

a 

can  easily  be  computed  for  cases  when  p  5^  0  and/or  /  1. 

The  calculations  for  finding  the  values  of  Ag  and  A4  via  the  method  of  moments,  given  specified 
values  of  03  and  04,  are  complicated.  Several  different  techniques  have  been  implemented.  Mykytka 
and  Ramberg  [10]  and  Mykytka  [9]  use  non-linear  programming  methods  to  find  the  minimum 
possible  sum-of-squared  errors  between  the  calculated  and  desired  values  of  03  and  04: 

Min  /(A3,A4)  =  (aaCAs,  A4)  —  03)^ -)- (Qf4(A3,A4)  —  0:4)^  (14) 

subject  io  A3  •  A4  >  0.  (15) 

Equation  15  insures  that  A3  and  A4  lie  in  either  Region  3  or  Region  4.  The  minimization  expressed 
in  Equation  14  is  performed  using  Powell’s  Algorithm  for  non-linear  function  minimization. 
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A2(P,<T^)  = 
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The  tables  of  Ramberg  ei  al  were  originally  developed  by  specifying  desired  values  for  03 
and  04,  then  finding  the  appropriate  values  of  A3  and  A4  using  a  FORTRAN  program  to  solve  the 
minimization  problem  described  in  Equation  14.  These  tables  contain  values  of  the  four  lambda 
parameters  for  values  of  03  between  zero  and  two  (in  increments  of  0.1)  and  values  of  04  in 
increments  of  0.2.  For  a  given  value  of  03,  the  values  for  04  tabled  are  the  smallest  values  for 
which  the  optimal  value  of  the  objective  function,  Equation  14,  was  appro.timately  zero.  Thus,  for 
a  given  value  of  03,  the  table  does  not  necessarily  show  the  minimum  possible  value  of  0-4  that  is 
theoretically  possible,  but  only  that  for  which  an  objective  function  value  near  zero  was  obtained. 

Mykytka  and  Ramberg  [10]  provide  a  user-friendly  FORTRAN  program  that  will  calculate 
the  four  GLD  parameter  values  for  combinations  of  skewness  and  kurtosis  not  given  in  the  tables 
of  [13].  In  addition,  Hsu  [5]  has  created  a  C-f-f  program  which  calculates  the  four  GLD  parameters 
for  a  given  data  set  and  then  allows  the  user  to  visually  "‘improve”  the  resulting  fit  to  the  data  by 
altering  the  value  of  one  or  more  of  the  parameters. 
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III.  Thesis  Objectives 

In  order  for  the  GLD  to  be  an  effective  probability  distribution,  useful  for  modeling  a  wide 
variety  of  random  variables,  we  desire  that  it  to  be  able  to  “mimic”  any  possible  combination  of 
skewness  and  kurtosis.  At  present,  it  cannot  do  so.  Mykytka  [9]  showed  a  tentative  boundary 
(reproduced  here  in  Figure  1)  for  the  range  of  03-04  combinations  that  can  be  produced  using  the 
GLD.  This  boundary  is  based  on  his  tabulated  results,  which  are  also  found  in  Ramberg  ei  at  [13]. 
However,  as  mentioned  previously,  these  tabulated  results  only  show  combinations  of  03  and  04 
for  which  the  numerical  search  procedure  produced  a  solution  with  a' near-zero  objective  function. 
Therefore,  we  are  not  assured  that  additional  combinations  do  not  exist  above  this  boundary. 

We  wish  to  take  steps  to  either  confirm  Mykytka’s  boundary  or  to  expand  the  GLD’s  coverage 
region.  However,  due  to  the  complexity  of  the  procedures  used  to  find  the  A3  and  X.i  values  that 
correspond  to  a  specified  skewness-kurtosis  combination,  there  are  several  potential  problems  that 
could  be  limiting  the  range  of  distributions  the  GLD  can  mimic.  Each  of  these  problems  will  be 
discussed  in  detail  in  the  following  chapters,  along  with  possible  methods  for  their  elimination.  We 
may  discover  that  none  of  these  problems  affect  the  GLD’s  coverage  region.  If  that  is  indeed  the 
case,  the  efforts  outlined  in  this  thesis  will  not  be  in  vain.  By  thoroughly  examining  these  concerns, 
we  will  firmly  establish  the  limitations  of  the  GLD,  which  will  be  of  value  to  future  researchers. 

In  Chapter  IV,  we  present  an  in-depth  analysis  of  the  particular  implementation  of  Powell’s 
Algorithm  used  by  Mykytka  and  Ramberg  [10].  The  algorithm  was  originally  designed  to  search 
over  an  unconstrained  region,  so  we  will  concentrate  on  its  behavior  as  it  approaches  the  constraints 
defined  by  Equation  15.  We  then  examine  the  effects  of  the  penalty  function  that  is  currently  used 
to  enforce  these  boundaries.  Finally,  we  describe  possible  remedies  to  the  problems  involved  with 
using  Powell’s  Algorithm.  We  analyze  one  possible  solution  in  particular:  a  reparameterization  of 
the  GLD  parameters  in  order  to  create  an  unconstrained  search  region. 
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In  Chapter  V,  we  present  an  analysis  of  some  possible  limitations  of  the  Cbl).  Instead  of 
determining  the  values  of  A3  and  A,j  that  correspond  to  a  particular  03-04  combination,  we  instead 
look  at  the  values  of  03  and  0.1  that  result  from  different  combinations  of  A3  and  A.^.  By  examining 
a  wide  range  of  A3-A4  combinations,  we  may  be  able  to  gain  some  insight  about  and  expand  the 
limits  on  the  GLD’s  range. 

Chapter  VI  examines  some  of  the  distributions  tl  „ ,  are  not  presently  covered  by  the  GLD. 
By  examining  these  pdfs,  ’.ve  will  get  a  better  grasp  for  the  types  of  distributions  we  can  not  yet 
model. 

As  mentioned  previously,  the  variability  of  the  higher-order  sample  moments  can  be  quite 
large,  especially  in  small  data  sets.  Because  of  this  potential  problem.  Chapter  VII  describes  L- 
moments,  an  alternative  to  the  use  of  the  traditional  sample  moments.  It  provides  a  summary  of 
the  GLD’s  four  L-moment  equations,  as  well  as  some  suggestions  for  measuring  L-moments  from  a 
empirical  data  set. 

Finally,  Chapter  VIII  summarizes  the  results  of  this  re.search  and  suggests  some  possible  paths 
for  follow-on  efforts. 
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/V’.  Analysis  of  Powell's  Algorithm 


4.J  Background 

The  search  procedure  used  by  Mykytka  and  Raniberg  [10]  to  determine  tlie  optimal  values 
for  A3  and  A4  makes  use  of  Powell's  Algorithm.  This  algorithm  wa.s  originally  designed  for  uncon¬ 
strained  non-linear  function  minimization.  However,  the  GLD  is  constrained  by  the  requirement 
that  A3  and  A4  have  the  same  sign: 

A3  •  A.^  >  0. 

As  previously  mentioned,  our  objective  is  to  minimize  the  sum-of-squared  errors  between  the 
desired  and  calculated  values  of  03  and  subject  to  this  constraint  (see  Equations  14  and  15). 
In  the  current  implementation,  “unacceptable”  values — those  combinations  that  do  not  satisfy 
Equation  15 — are  eliminated  from  consideration  by  replacing  their  sum-of-squared  errors  with  a 
large  penalty.  We  enforce  this  penalty  by  defining  the  objective  function  as 

min  Z  =  /(A3,  A<) 


where 

2  =  [(c»3(A3i  A4)  —  03)^  -f  (q’4(A3,  A4)  —  04)*];  A3  •  A4  >  0 
Z  =  lOj  A3  •  A4  <0. 

Since  an  appropriate  “match”  to  the  desired  03  and  04  values  is  produced  only  when  the  optimal 
value  of  the  objective  is  zero  (practically  interpreted  as  having  an  objective  function  value  less  than 
some  arbitrarily  small  value’),  we  can  avoid  “unacceptable”  combinations  of  A3  and  A4  by  utilizing 
this  penalty. 

*  For  example,  the  FORTRAN  code  presented  in  Mykytka  and  Ramberg  [10]  will  warn  the  user  of  2m  unacceptable 
mAtch  when  the  final  value  calculated  for  Z  exceeds  O.CXX)2. 
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Unfortunately,  we  do  not  know  how  Powell’s  Algoritlnn  acts  when  faced  with  this  constraint, 
i.e.,  does  the  inclusion  of  the  penalty  force  the  search  into  inappropriate  directions?  Our  concern 
is  that  the  penalty  might  possibly  cause  the  search  algorithm  to  exhibit  undesired  or  detrimental 
behavior,  which  might  limit  the  range  of  potential  03-rt.i  combinations  for  which  we  can  find 
solutions. 

As  noted  earlier,  the  only  valid  combinations  of  A3  and  A4  lie  in  Regions  3  and  4 — regions 
that  are  mutually  exclusive^  (see  Figure  2).  What  happens  if  our  initial  estimates  of  A3  and  A4 
are  in  Region  3  when  the  actual  result  lies  in  Region  4?  As  Powell’s  Algorithm  proceeds  towards 
(0,0)  in  the  A3-A4  space,  it  could  be  faced  with  the  problem  of  facing  a  restricted  region  in  each 
potential  direction  of  travel.  What  happens  in  this  case?  Can  we  be  certain  that  the  algcrithr  ’ 
moves  in  the  proper  direction? 

Some  preliminary  answers  already  exist.  When  using  the  FORTRAN  program  given  by 
Mykytka  and  Ramberg  [10],  the  user  must  input  an  initial  “guess”  at  the  values  for  A3  and  A4. 
Since  A3  and  A4  must  have  the  same  sign,  their  initial  values  must  be  either  both  positive  or  both 
negative.  It  can  be  shown  that  even  if  the  initial  guess  is  in  the  wrong  region  (for  example,  an 
initial  guess  in  Region  4  could  be  provided  when  the  optimal  solution  actually  lies  in  Region  3)  the 
algorithm  will  sometimes  switch  regions  to  find  the  optimal  solution.  On  the  other  hand,  due  to 
the  nature  of  the  algorithm’s  .“search  technique  (which  will  be  discussed  in  detail  in  Section  4.3),  the 
possibility  of  jumping  like  this  directly  from  one  acceptable  region  directly  to  the  other  is  slight. 
That  is,  although  the  algorithm  sometimes  does  switch  regions,  it  usually  does  not.  If,  in  practice, 
we  do  not  achieve  a  satisfactory  Z-value  when  starting  in  one  region,  we  need  simply  apply  the 
algorithm  a  second  time,  changing  our  starting  point  to  one  in  the  opposite  region  and  hopefully 
producing  a  solution  with  a  better  objective  function  value. 

^The  point  A3=A4=  0,  although  common  to  both  regions,  does  not  yield  a  valid  GLD  pdf. 
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This  behavior  suggests  that  we  could  see  some  momentary  attempts  by  tlie  algorithm  to 
find  potential  solutions  in  the  “forbidden  zone”  (j.e.  combinations  of  A3  and  A4  that  lie  outside 
of  Regions  3  and  4  in  h'igure  2)  as  the  algorithm  attempts  to  cross  from  one  region  to  the  other. 
What  affect  might  this  have  on  the  search  algorithm? 

The  more  important  cases  for  this  concern  ate  those  where  the  optimal  A3-A4  solutions  lie 
very  close  to  one  of  the  “forbidden  zone”  boundaries.  If  we  assume  that  our  initial  guess  lies  in 
the  proper  region,  we  can  reasonably  e.vpect  that  the  magnitude  of  Z  will  be  decreasing  as  Powell’s 
Algorithm  approaches  its  optimal  solution  (minimum  Z-value).  As  the  algorithm  nears  this  optimal 
point,  its  search  might  reach  into  the  “forbidden  zone.”  The  algorithm  is  suddenly  faced  with  a 
“large”  increase  in  the  .^-value  (namely,  Z  =  10)  in  that  particular  direction.  Does  this  bias  the 
search  in  any  way?  Such  a  problem  might  cause  the  search  to  proceed  in  an  inappropriate  direction 
or  prevent  it  from  converging  to  a  solution  that  lies  near  the  constraining  boundaries.  As  a  result, 
the  algorithm  might  fail  to  converge  to  a  solution  with  a  non-zero  objective  function  value  and  the 
particular  a^-on  combination  would  be  considered  to  be  infeasible,  even  though  a  valid  solution 
does  indeed  exist. 

^.2  Methodology 

To  investigate  this  problem,  we  will  examine  the  behavior  of  the  algorithm,  step-by-step, 
as  it  proceeds  near — or  attempts  to  cross — the  limiting  boundaries.  This  will  be  accomplished 
by  altering  the  FORTRAN  code  of  Mykytka  and  Ramberg  [10]  to  print  each  of  the  intermediate 
points  in  a  particular  search.  By  examining  the  intermediate  steps  generated  by  the  algorithm  as 
it  proceeds  to  its  final  solution,  we  can  get  a  more  complete  understanding  of  its  behavior  and  a 
fuller  confidence  in  its  results. 

We  will  also  inspect  the  behavior  of  the  algorithm  when  we  provide  a  starting  value  in  the 
“opposite”  region.  We  know  there  are  some  cases  where  the  algorithm  does  cross  from  one  region 
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to  the  other  in  searching  for  its  optimal  value.  By  comparing  the  results  of  these  searches  to  those 
when  starting  points  in  the  proper  region  are  used,  we  will  get  an  insight  into  the  behavior  of  the 
algorithm  as  it  attempts  to  search  for  a  solution  in,  or  in  the  direction  of,  the  “forbidden  zone.” 

In  Section  4.4,  we  will  examine  cases  where  the  optimal  A3  and  A4  solutions  lie  close  to  the 
constraining  boundaries.  By  examining  the  results  of  searches  performed  with  and  without  the 
penalty  function,  we  will  be  able  to  evaluate  the  effect  of  the  penalty  function  on  the  algorithm. 

VVe  will  use  the  tabulated  values  of  Ramberg  et  at  [13:210-214]  as  a  source  of  test  values  for 
this  effort.  In  particular,  we  will  be  examining  values  near  (or  past)  the  point  where  their  particular 
implementation  of  Powell’s  Algorithm  failed  to  converge.  Although  we  will  use  the  FORTRAN  code 
presented  in  Mykytka  and  Ramberg  [10]  instead  of  the  code  used  to  generate  the  tables  in  [13],  the 
two  versions  are  nearly  identical  and  extensive  experience  with  the  code  suggests  that  we  can  be 
confident  that  they  will  yield  identical  results. 

4-3  Unconstrained  Behavior  of  the  Algorithm 

As  a  starting  point,  we  examine  the  search  techniques  used  in  Powell’s  Algorithm.  By  studying 
how  the  algorithm  usually  performs  searches  in  an  unconstrained  space,  we  will  hopefully  be  able 
predict  its  behavior  when  it  is  presented  with  limiting  boundaries.  We  will  use  Powell’s  original 
paper,  [11],  as  well  as  the  FORTRAN  code  of  Kuester  and  Mize  [7],  as  references  for  this  analysis. 

Powell  uses  a  variation  of  the  method  of  minimizing  a  function  of  several  variables  by  changing 
only  one  parameter  at  a  time.  His  method  uses  conjugate  search  directions  at  each  iteration,  which 
results  in  a  “fast”  rate  of  convergence. 

Each  iteration  of  the  algorithm  consists  of  linear  searches  down  n  independent  directions, 

•  •  M^n,  where  n  is  the  number  of  variables  for  which  we  desire  values.  We  start  from  the  best 
approximation  of  the  minimum,  po.  Initially,  the  search  directions  are  chosen  to  be  in  the  direction 
of  the  coordinate  axes  and  po  is  simply  our  initial  “guess”  of  the  point  that  yields  the  optimal 
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objective  function  value.  Each  iteration  defines  a  new  search  direction,  ^  and,  if  a  test  is  passed, 
this  new  direction  replaces  one  of  the  original  search  directions.  A  description  of  an  iteration  of 
the  algorithm  is  given  in  Table  2. 

1.  For  r  =  1, 2, . . . ,  n  calculate  Vr  so  that  /(pr-i  +  is  a  minimum  and  define  pr  =  Pr-i  + 
Vrir- 

2.  Find  the  integer,  m,  1  <  m  <  n,  so  that  [/(pm-i)  —  /Pm)]  is  a  maximum,  and  define 
A  =  /(pm-l)-/(Pn)- 

3.  Calculate  /a  =  /(2pn  -  po),  and  define  /i  =  /(po)  and  fi  =  /(pm)- 

4.  If  either  /a  >  /i  and/or  (/j  —  2/2  +  /a)  •  (/i  —  /a  -  A)*  >  ~  /a)^,  use  the  old  directions, 

•  •  •  i^n  fof  the  next  iteration  and  use  p„  as  the  next  po,  otherwise, 

5.  Defining  ^  =  (p„  —  po),  calculate  u  so  that  /(pn  +  t'O  is  a  minimum.  Use 

^11^2)  •  •  •  i^m-ii^m+ii^m+2i^n,^  as  the  directions  and  Pn  +  as  the  starting  point  for  the 
next  iteration. 

Table  2.  Iteration  Procedure  for  Powell’s  Algorithm 

The  process  outlined  above  is  a  modification  to  his  original  method,  requiring  a  larger  num¬ 
ber  of  iterations,  but  in  [11],  Powell  states  that  is  a  valuable,  and  in  some  instances,  essential 
modification.  The  criterion  for  convergence  is  given  in  Table  3  (taken  from  Powell  [11:158]). 

1.  Apply  the  normal  procedure  until  an  iteration  causes  the  change  in  each  variable  to  be  less 
that  one-tenth  of  the  required  accuracy,  denote  the  relevant  point  as  a. 

2.  Increase  every  variable  by  ten  times  the  required  accuracy. 

3.  Apply  the  normal  procedure  until  an  iteration  causes  the  change  in  every  variable  to  be  less 
than  one-tenth  of  the  required  accuracy  again.  Denote  the  resultant  point  as  b. 

4.  Find  the  point  at  which  the  function  is  minimized  over  the  line  through  a  and  b,  denote  this 
point  as  c. 

5.  Assume  ultimate  convergence  if  the  components  of  (a  —  c)  and  (b  —  c)  are  all  less  than 
one-tenth  the  required  accuracy  in  the  corresponding  variables,  otherwise 

6.  Include  the  direction  (a  —  c)  in  place  of  ,  and  restart  the  procedure  from  Step  1. 

Table  3.  Criterion  for  Convergence  of  Powell’s  Algorithm 


In  the  FORTRAN  code  of  Kuester  and  Mize  [7]  (which  is  the  same  code  used  by  Mykytka  [9] 
and  Mykytka  and  Ramberg  [10]),  the  linear  searches  are  performed  using  quadratic  approximation 
techniques  (see  Chapter  7  of  [7]).  The  function  is  evaluated  at  three  different  points:  /(p),  f{p+qi), 
and  either  fip+2q()  or  f{p-q^),  depending  on  whether  /(p)  is  less  than  or  greater  than  /(p+f/O- 
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The  term  q  represents  the  length  of  the  step  along  the  line,  and  p  represents  the  current  point  in 
the  search  process.  These  three  points  are  used  to  determine  whether  the  function  is  at  a  local 
minimum.  If  it  is  not,  the  three  points  are  used  to  generate  the  “turning  value,"  which  is  calculated 
using  a  quadratic  function  of  the  points  and  their  respective  function  values.  The  “turning  value" 
determines  in  which  direction  the  search  will  continue. 

In  our  particular  case,  we  should  not  expect  the  algorithm  to  venture  into  the  “forbidden 
zone.”  If  one  of  the  algorithm’s  quadratic  approximation  searches  attempts  to  enter  this  area,  the 
large  value  for  the  penalty  function  should  cause  it  to  reverse  course  and  head  in  the  opposite 
direction — back  into  one  of  the  valid  regions.  If  we  only  searched  in  the  directions  of  the  coordinate 
axes,  it  would  be  almost  impossible  for  the  algorithm  to  begin  a  search  in  one  region  and  conclude 
in  the  other.  However,  since  the  search  directions  will  most  likely  change  as  we  iterate  through  the 
process,  it  is  possible  that  the  algorithm  will  “switch”  regions,  although  it  appears  this  behavior 
will  occur  only  in  cases  where  Powell’s  Algorithm  can  move  directly  from  one  valid  region  to  the 
other  during  the  course  of  a  single  quadratic  approximation  search. 

To  test  this  assumption,  we  examined  several  cases  where  our  starting  point  was  in  the  wrong 
region,  and  the  solution  algorithm  switched  to  the  appropriate  region  for  the  optimal  result.  In  all 
the  cases  that  were  analyzed,  some  common  behaviors  were  observed.  First  of  all,  the  algorithm 
always  switcheo  regions  near  the  (0,0)  point  in  the  A3-A4  space.  Secondly,  none  of  the  quadratic 
approximation  searches  terminated  in  the  forbidden  zone.  In  all  cases,  one  particular  quadratic 
approximation  search  had  a  starting  point  in  the  initial  region  and  an  ending  point  in  the  opposite 
region.  The  algorithm  continued  its  search  from  this  new  point  and  eventually  found  the  optimal 
solution. 

We  present  the  results  of  one  particular  case  here  as  an  example.  For  this  analysis,  we  used 
the  FORTRAN  code  of  Mykytka  and  Ramberg  [10]  to  find  the  lambda  parameters  for  the  following 
combination  of  the  first  four  moments:  =  0.0,  =  1.0,  03  =  0.0,  04  =  6.0.  This  combination 
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is  included  in  the  tables  of  Ramberg  ti  al  [13]  with  the  following  results:  A]  =  0,  Ao  =  -0.1686, 
As  =  —0.0802,  A4  =  —0.0802.  Obviously,  this  result  lies  in  Region  4  of  Figure  2.  Two  different 
starting  points  were  used.  First,  we  used  A3  =  A4  =  -0.05,  which  also  lies  iii  Region  4.  We  then 
used  A3  =  A4  =  0.5,  which  is  in  Region  3.  The  FORTRAN  code  yielded  the  tabulated  values  for 
both  Cases,  indicating  that,  in  the  latter  case,  the  search  switched  regions. 

The  particular  steps  taken  in  the  latter  search  (A3  =  A4  =  0.5)  are  presented  in  Figure  3. 
The  points  in  Figure  3  represent  the  termination  points  of  each  iteration  of  Powell’s  Algorithm, 
while  the  segments  connecting  them  represent  the  path  taken  from  the  initial  point  in  Region  3 
(top  right-hand  corner)  to  the  optimal  point  in  Region  4  (lower  left-hand  corner).  As  can  be  seen, 
the  algorithm  crosses  regions  near  the  (0,0)  point,  and  no  iteration  termination  points  lie  in  the 
forbidden  zone.  Figure  4  shows  the  termination  points  of  all  the  quadratic  approximation  searches 
used  as  the  algorithm  approached  and  crossed  the  boundary.  Again,  the  segments  represent  the 
path  taken  by  the  algorithm.  As  we  expect,  none  of  the  termination  points  for  the  quadratic  ap¬ 
proximation  searches  lie  in  the  forbidden  region.  The  algorithm,  therefore,  is  obviously  capable  of 
jumping  directly  from  one  region  to  the  other. 

4-4  Constrained  Behavior  of  the  Algorithm 

Our  next  concern  was  with  regard  to  cases  where  the  values  for  A3  and  A4  that  minimize 
Equation  14  lie  within  Regions  3  or  4  and  close  to,  or  on,  one  of  the  limiting  boundaries.  Our  analysis 
of  the  algorithm  suggests  that  it  might  be  possible  for  the  algorithm  to  “miss”  the  optimal  value. 
An  example  will  demonstrate  this.  Let  us  assume  that  an  iteration  of  Powell’s  Algorithm  concludes 
at  a  point  near  the  optimal  value  (which  lies  arbitrarily  near  one  of  the  limiting  boundaries),  but  the 
point  does  not  cause  the  algorithm  to  meet  the  termination  criteria  given  in  Section  4.3.  Another 
set  of  quadratic  approximation  searches  is  therefore  performed.  Let  us  further  ^tssume  that  the 
first  such  search  is  correctly  in  the  direction  of  the  optimal  point,  but  also  in  the  direction  of  the 
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“forbidden  zone."  If  the  quadratic  approximation  search  encounters  a  candidate  solution  either 
in  the  “forbidden  zone"  or  at  a  point  where  the  value  of  the  function  is  now  much  larger  than 
before,  the  search  process  will  switch  directions  and  search  away  from  the  optimal  point.  If  this 
were  to  happen  for  several  successive  iterations,  it  would  seem  possible  for  the  algorithm’s  search 
termination  criteria  to  be  met  at  some  point  other  than  the  optimal  solution-  at  the  “wrong  point.” 
Hopefully  the  additional  searches  detailed  in  Steps  2-5  of  Table  3  will  prevent  this  from  happening, 
but  we  cannot  be  sure. 

The  choice  of  starting  values  for  A3  and  A4  play  a  role  in  determining  the  solution  at  which 
the  algorithm  will  terminate.  Because  of  this,  one  possible  remedy  to  the  potential  problem  we  have 
described  might  be  to  perform  additional  searches,  using  different  starting  points,  as  a  check.  If 
these  agree,  then  one  might  have  more  confidence  that  this  potential  problem  was  not  encountered. 

Another  solution  to  this  problem  would  be  to  decrease  the  step  size,  q,  used  in  the  quadra¬ 
tic  approximation  searches.  By  decreasing  the  interval  between  successive  search  points,  we  can 
reduce  the  possibility  of  the  above  situation  occurring,  since  we  hop  ,-fully  would  be  less  likely  to 
“skip  over”  the  optimal  point  or  step  into  the  “forbidden  zone.”  This  is  done  at  a  cost,  however. 
By  decreasing  the  step  size,  we  might  increase  the  number  of  iterations  needed  to  generate  the  final 
solution,  which  in  turn  will  require  more  CPU  time  ,0  solve  a  particular  problem. 

Despite  these  possible  solutions,  we  are  still  not  certain  as  to  what  effect  the  current  penalty 
function  has  on  the  search  procedure.  Does  its  inclusion  force  the  search  algorithm  into  incorrect 
directions?  How  likely  is  this  situation  to  occur?  This  is  very  hard  to  predict. 

As  a  way  of  answering  these  questions,  we  modified  the  FORTRAN  program  of  Mykytka  and 
Ramberg  [10]  to  allow  any  combination  of  A3  and  A4,  i.e.,  Equation  15  is  ignored.  By  doing  this, 
the  algorithm  will  not  be  affected  by  the  penalty  function,  although  we  must  remember  that  any 
combination  of  A3  and  A4  that  lies  outside  of  Regions  3  and  4  of  Figure  2  corresponds  to  an  invalid 
pdf. 
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By  comparing  the  resulting  values  for  the  constrained  and  unconstrained  versions,  we  will 


be  able  to  determine  whether  the  current  penalty  function  adversely  affects  the  algorithm.  If  the 


penalty  function  is  affecting  the  algorithm,  we  should  see  a  difference  in  the  final  values  of  A3  and 


A4  generated  by  the  two  methods.  If  it  has  no  effect,  there  should  be  no  difference  between  the  two 


methods.  We  choose  to  examine  a  number  of  points  that  lie  on  the  limiting  boundaries  (f.e.,  either 
A3  or  A<  equals  zero),  since  these  are  the  cases  where  the  penalty  function  will  be  a  factor.  The 


03-04  combinations  given  in  Table  4  were  chosen  using  an  adapted  version  of  the  FORTRAN  code 
of  Mykytka  and  Ramberg  [10].  This  revised  version  calculates  the  resulting  03-04  combination 


when  given  specific  values  of  A3  and  A4.  A  summary  of  the  results  is  given  in  Table  4. 


Desired 

Desired 

CONSTRAINED 

UNCONSTRAINED 

03 

04 

Aa 

A4 

Min  value 

A3 

X4 

Min  value 

-0.5656 

2.4000* 

0.4999 

4.873  X  lO-^ 

2.604  X  10-“ 

0.5000 

-4.237  X  10“® 

3.75  X  10-'^ 

-1.0498 

3.6964 

0.2500 

6.577  X  10“' 

1.877  X  lO-® 

0.2500 

-6.049  X  10-*^ 

3.3387  X  10-'^ 

0.3636 

1.8701 

1.5000 

1.282  X  lO-" 

2.671  X  10->" 

1.5000 

-2.256  X  10-*^ 

8.44  X  10“''^ 

0.4500 

2.2000 

4.986  X  10-® 

0.5812 

1.554  X  10-® 

4.986  X  10-® 

0.5812 

1.554  X  lO-*" 

0.0000 

1.8000 

1.751  X  10-' 

1.0000 

3.223  X  lO-*-* 

6.591  X  10- 

1.0000 

3.94  X  10“^“ 

1.0498 

3.6964 

1.576  X  lO-o 

0.2500 

1.093  X  lO-" 

1.576  X  lO-' 

0.2500 

1.093  X  lO-'' 

0.1897 

1.9009 

1.491  X  lO-" 

0.7999 

1.749  XlO-''> 

1.491  X  10-“ 

0.7999 

1.740  X  lO-'" 

-0.1969 

1.7961 

1.849  X  10-'> 

1.250 

4.647  X  10“*' 

1.849  X  10“* 

1.250 

4.647  X  10-“' 

*  (he  A3  and  A^  values  given  for  the  constrained  search  yielded  014  sz 

2.4001. 

Table  4.  Comparison  of  Unconstrained  and  Constrained  Searches 


As  Table  4  shows,  the  search  procedure  yields  almost  identical  solutions  for  the  two  cases.  The 
columns  labelled  “Min  value”  represent  the  sum-of-squared  errors  calculated  using  the  displayed 
values  of  A3  and  A4. 

An  examination  of  the  table  shows  that  along  the  A3  =  0  boundary  («.e.  the  last  five  cases  of 
Table  4),  the  two  searches  yield  almost  identical  values  for  A3,  A4,  and  the  sum-of-squared  errors. 
In  fact,  the  results  are  identical — except  in  the  case  where  03  =  0.0  and  04  =  1.8.  In  this  case,  the 
unconstrained  search  yields  slightly  smaller  values  for  A3  and  the  sum-of-squared  errors  than  the 
constrained  search  does.  Even  though  this  is  true,  the  errors  are  still  sufficiently  close  to  zero,  and 
for  all  intents  and  purposes,  the  two  A3-A4  combinations  are  the  same.  We  can  therefore  conclude 
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that  the  penalty  function  did  not  adversely  affect  the  search  algorithm  for  searches  along  the  A3  =  0 
boundary. 

For  all  the  cases  along  the  A4  =  0  boundary,  a  smaller  final  objective  function  value  is 
possible  using  the  unconstrained  search.  The  sum-of-squared  errors  for  both  the  unconstrained 
and  constrained  cases  are  approximately  equal  to  zero,  but  the  unconstrained  A3-A4  combinations 
found  in  Table  4  are  in  the  “forbidden  zone”  and  therefore  do  not  represent  valid  pdfs.  Although 
the  resulting  A3-A4  corhbinations  from  the  constrained  searches  have  slightly  larger  sum-of-squared 
errors,  they  still  correspond  to  valid  pdfs.  For  both  cases,  the  resulting  values  for  A4  are  sufficiently 
close  to  zero  so  that  we  can  safely  assume  A4  =  0.  Since  the  corresponding  A3  values  are  almost 
identical  as  well,  the  penalty  function,  therefore,  does  not  adversely  affect  the  search  algorithm 
near  the  A^  =  0  boundary  either. 

From  the  cases  we  have  examined,  we  can  see  that  the  use  of  the  current  penalty  function 
does  not  present  a  problem.  Powell’s  Algorithm  behaved  consistently — with  or  without  the  penalty 
function.  Therefore,  we  are  reasonably  confident  that  we  can  eliminate  the  penalty  function  as  a 
source  of  error. 

Reparameterizaiion 

4.5.1  Derivation.  Based  on  the  results  of  the  previous  sections  of  this  chapter,  it  appears 
that  we  can  reasonably  expect  Powell’s  Algorithm  to  move  in  the  proper  fashion  when  faced  with 
our  constraint.  However,  we  can  not  forget  that  Powell’s  Algorithm  was  expressly  designed  for 
unconstrained  function  optimization.  We  thus  can  never  be  entirely  sure  that  the  algorithm  will 
always  behave  properly  when  faced  with  a  constrained  search  region.  It,  therefore,  may  be  worth¬ 
while  to  discuss  some  techniques  to  avoid  using  an  unconstrained  algorithm  in  this  constrained 
space. 
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The  first,  and  perhaps  simplest  method,  is  to  simply  ignore  the  constraint  altogether.  For 
the  cases  shown  in  Section  4.4,  the  values  of  A3  and  .'4  calculated  using  an  unconstrained  version 
of  Powell’s  Algorithm  were  almost  identical  to  those  found  using  the  constrained  .search  a,rea. 
Therefore,  we  could  search  with  Powell’s  Algorithm  over  an  unconstrained  region,  and  simply 
discard  any  A3-A4  combination  which  lies  outside  of  Regions  3  or  4.  On  the  other  hand,  how  do  we 
determine  appropriate  A3  and  A4  values  for  these  cases?  Should  we  simply  conclude  that  these  cases 
are  infeasible?  The  use  of  a  different  starting  point  might  be  an  option,  since  the  objective  function 
is  known  to  be  multi-modal.  However,  if  no  appropriate  values  result  from  these  subsequent 
searches,  we  would  be  forced  exclude  that  particular  03-04  combination  from  consideration,  i.e., 
we  would  conclude  that  that  skewness-kurtosis  combination  can  not  be  modeled  by  the  GLD. 

A  second  approach  is  to  eliminate  Powell’s  Algorithm  altogether  and  replace  it  with  a  similar 
constrained  algorithm.  This  would  involve  investigating  potential  replacements,  and  then  imple¬ 
menting  a  new  routine.  Since  we  are  uncertain  that  any  other  algorithm  would  perform  any  better, 
this  could  be  a  time-consuming,  and  perhaps  unnecessary,  process. 

A  third  possibility  is  to  reparameterize  the  GLD  so  that  its  parameters  are  unconstrained. 

By  doing  this,  we  can  eliminate  the  need  for  the  constraint.  Equation  15,  thereby  changing  the 
minimization  problem  to  one  involving  an  unconstrained  search  —  a  situation  for  which  Powell’s 

Algorithm  was  expressly  designed.  This  last  method  seems  to  hold  the  most  promi.se  for  a  quick, _ 

easy  solution  to  the  problem. 

There  are  several  different  options  for  reparameterizing  the  GLD.  One  of  the  easiest  is  the 
following: 

ffi  =  ln(A,);  Aj  =  e*‘ 

when  A3,  A4  >  0,  and 

=  ln(-A.);  Ai  = 


\ 

4 

\ 


I  - 
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when  A3,  A4  <  0. 

By  changing  the  variables  in  this  manner,  we  create  an  unconstrained  search  area.  Since 
e*  >  0  for  all  x,  we  are  eissured  that  the  original  parameters,  A3  and  A4,  will  always  have  the  same 
sign,  and  we  therefore  do  not  need  the  A3  •  A4  >  0  c.3nstraint. 

It  should  be  noted  that  this  reparameterized  version  will  not  be  able  to  switch  from  one  valid 
region  to  the  other,  as  the  original  sometimes  did.  This  is  not  a  big  sacrifice,  since  we  know  that  the 
original  usually  did  not  switch  regions,  and  therefore  could  not  rely  on  such  behavior.  If  the  new 
search  algorithm  fails  to  achieve  an  acceptable  solution  when  given  a  starting  value  in  one  region, 
wc  need  simply  rerun  the  FORTRAN  program  using  a  starting  point  in  the  opposite  region. 

The  reparameterization  is  accomplished  by  altering  the  FORTRAN  code  used  by  Mykytka 
and  Ramberg  [10].  The  main  change  involves  switching  the  starting  values  for  A3  and  A4  to  the 
new,  reparameterized  form.  After  the  initial  A  values  are  input  by  the  user,  they  are  calculated  in 
0  form  before  being  sent  to  the  FORTRAN  subroutine  which  performs  Powell’s  Algorithm. 

The  minimization  problem  therefore  becomes: 

min  Z  =  f{03,0i)  =  [(03(^3. ^4)  -  <»3)^  +  («4(<^3,^4)  —  04)^], 


which  would  involve  recalculating  the  equations  for  03  and  04,  Equations  8  and  9,  using  the  new 
theta  parameters— la  time-consuming  process.  An  alternative  to  this  is  to  perform  the  Powell’s 
Algorithm  search  procedure  in  the  unconstrained  0  space,  but  evaluate  the  Z-values  at  each  point 
in  the  iteration  using  the  A  values  that  correspond  to  the  calculated  0s.  This  is  easy  to  accomplish, 
since  the  FORTRAN  program  utilizes  a  separate  subroutine  to  calculate  the  values  of  03  and 
04  for  a  given  A3-A4  combination.  We  need  simply  change  the  0s  back  into  their  respective  As 


upon  entering  this  suoroutine  to  determine  the  values  of  03  and  04.  We  therefore  use  the  new 
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theta  parameter^  only  in  the  subroutine  which  performs  Powell’s  Algorithm,  w’hile  the  rest  of  the 
program  remains  more  or  less  the  same. 

In  order  to  prove  the  validity  of  this  technique,  the  method  was  checked  with  03  -  04  com¬ 
binations  documented  in  the  tables  of  Ramberg  ei  al  [13].  The  new  method  did  achieve  the  same 
values  of  A3  and  A4,  although  sometimes  two  separate  searches  were  necessary  in  the  reparameter¬ 
ized  case^.  Since  the  results  of  the  two  methods  were  identical,  we  can  attempt  to  expand  to  values 
beyond  the  tabulated  range. 

4-5.2  Resitlls.  This  expansion  was  accomplished  by  first  selecting  a  starting  03-04  com¬ 
bination  from  the  tables  of  [13].  By  choosing  a  03-04  combination  who.se  A3  and  A4  values  are 
already  known,  we  assure  the  revised  FORTRAN  program  is  working  properly  and  give  ourselves 
a  basis  for  comparison  between  the  original  and  reparameterized  versions.  After  the  starting  03 
and  04  values  are  chosen,  we  use  the  repararneterized  FORTRAN  program  to  find  the  appropriate 
A3  and  A4  parameters.  If  the  values  of  A3  and  A4  yield  a  siim-of-squared  error  below  our  accept¬ 
able  tolerance  (approximately  0.0001),  the  value  of  04  is  decremented  by  0.2  and  the  FORTRAN 
program  is  run  again.  When  the  sum-of-squrced  errors  exceeds  our  tolerance,  we  chose  a  different 
value  of  tt3  and  restart  the  process. 

l^yeral  different  values  of  03  weie  chosen  For  the  03-04  combinations  listed  in  the  tables 
of  [13],  the  reparameterized  version  yielded  the  same  A3-A4  combinations.  However,  the  reparam¬ 
eterized  version  was  not  able  to  expand  beyond  the  tabulated  values  of  Ramberg  et  al. 

4.6  Conclusions 

This  chapter  has  looked  extensively  at  our  particular  implementation  of  Powell’s  Algorithm 
and  our  current  penalty  function  as  sources  of  possible  limitations  on  the  range  of  03-04  combi¬ 
nations  that  can  be  modeled  using  the  GLD.  Section  4.1  examined  the  inner  workings  of  Powell’s 

*Thi8  was  usually  because  the  initial  guess  lay  in  the  opposite  region  as  the  final  result. 
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Algorithm  to  provide  a  better  understanding  for  how  the  search  procedure  is  actually  performed. 
It  also  discussed  the  potential  problems  the  penalty  function  might  cause  when  trying  to  find  A3-A.1 
solutions  located  near  the  limiting  boundaries. 

Section  d.d  showed  that  the  penalty  function  does  not  adversely  affect  the  search  algorithm 
when  searching  for  values  located  near  the  boundaries.  Table  4  shows  that  the  results  between  the 
constrained  and  unconstrained  searches  were  consistent  along  the  A3  =  0  boundary,  and  that  only 
slight  differences  existed  between  the  two  sets  of  searches  along  the  A4  =  0  boundary.  Even  though 
the  unconstrained  search  yielded  lower  sum  -of-squared  errors  along  the  A4  =  0  boundary  than  the 
cons>ained  case,  the  resulting  As-A^  combinations  were  in  the  “forbidden  zone.”  The  results  from 
the  constrained  case  along  the  A4  =  0  boundary  were  inside  the  ferisible  region,  and  although  the 
sum-of-squared  errors  were  slightly  larger  than  the  unconstrained  case,  they  were  still  well  below 
acceptable  tolerances. 

Section  4.5  described  alternatives  to  the  current  situation.  In  particular,  it  described  the 
results  of  a  reparameterizatioa  of  the  A  parameters  to  parameters  that  are  unrestricted  in  sign. 
These  unrestricted  parameters  could  not  improve  the  range  of  03-Qf4  combinations  that  can  be 
modeled  using  the  GLD. 

These  results,  when  taken  together,  show  that  the  present  implementation  of  Powell’s  Algo¬ 
rithm  does  not  significantly  limit  the  GLD.  Even  in  cases  where  the  final  A3  and  A4  values  lie  on 
(or  near)  the  limiting  boundaries,  the  algorithm  was  not  affected  by  the  use  of  a  penalty  function. 
The  algorithm  also  performed  just  as  well  in  an  unconstrained  search  as  it  did  in  the  constrained 
case.  We  therefore  can  safely  move  on  to  investigate  other  issues. 
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V.  Graphical  Analysis  of  the  GlD 

5.1  Introduction 

Mykytka  used  the  tables  in  [9:87-117]'  to  identify  the  tentative  range  of  03-04  combinations 
that  are  possible  to  model  using  the  GLD  [9:22],  which  is  depicted  here  in  Figure  1.  He  also  presents 
a  contour  map  of  03  and  04  values  for  combinations  of  A3  and  A4  each  restricted  to  the  range  (0, 1) 
We  reprint  his  contour  map  ([9:39])  here  as  Figure  5. 

The  tables  used  to  generate  Figures  1  and  5  are  not  necessarily  complete.  As  we  mentioned 
in  Chapter  II,  the  smallest  value  of  04  listed  in  each  table  for  a  given  03  does  not  necessarily 
represent  the  minimum  value  of  04  that  is  possible  for  that  value  of  03  using  the  GLD.  It  is  merely 
the  smallest  value  found  for  which  the  optimal  solution  of  the  objective  function,  Equation  14, 
was  approximately  zero.  It  may,  therefore,  be  possible  that  values  of  the  lambda  parameters  for 
values  of  0:4  smaller  than  those  shown  in  the  table  do  indeed  exist,  but  for  unknown  reasons,  the 
FORTRAN  program  was  unable  to  find  them. 

The  research  outlined  in  this  chapter  attempts  to  extend  the  boundary  presented  in  Figure  1 
using  a  different  approach  to  the  problem.  As  a  result  of  this  effort,  we  will  discover  an  interesting 
simplification  to  the  equations  for  03  and  <14  that  holds  under  certain  conditions. 

5.2  Background 

An  examination  of  Mykytka’s  tables  [9:87-117]  shows  that  the  A3  and  A4  values  tend  to  repeat 
certain  behaviors.  A  representative  table  is  reprinted  here  in  Table  5.  As  Table  5  shows,  for  a  given 
value  of  03,  as  the  value  of  04  is  decreased  (moving  from  the  bottom  to  the  top  of  a  particular 
table),  the  corresponding  A3  value  starts  negative,  and  increases  towards  zero.  The  value  of  A3  then 
takes  on  a  positive  value,  increrises  for  a  while,  and  then  decreases  towards  zero.  The  corresponding 

‘These  tables  can  also  be  found  in  Ramberg  et  at  [13]. 
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valui's  also  start  as  negatives,  increase  towards  zero,  and  become  positive,  but  then,  instead 
of  decreasing  towards  zero,  the  A4  values  continue  to  increase. 

Since  this  behavior  is  repeated  for  most  of  the  tables  of  03  and  04  found  in  [1 3] ,  we  hypothesize 
that,  for  a  given  value  of  03,  the  minimum  value  of  0-4  possible  using  the  GLl)  will  occur  when 
A3  =  0. 

Further  evidence  of  this  can  be  seen  in  Figure  5.  This  figure  shows  a  subset  of  the  values  of 
03  and  04  that  result  from  given  combinations  of  A3-A4.  We  can  see  that  for  a  given  value  of  03, 
the  minimum  04  occurs  when  A3  =  0.  On  the  other  hand.  Figure  5  only  shows  a  small  (although 
important)  subset  of  the  possible  values  of  A3  and  A4,  i.e.  those  between  zero  and  1,  so  we  might 
witness  different  behavior  for  value  of  A3  and  A4  outside  of  this  range. 

As  a  first  step  in  attempting  to  prove  this  hypothesis,  we  will  attempt  to  expand  Figure  5  by 
exploring  regions  outside  of  the  range  0  <  A3,  A4  <  1.  By  extensively  examining  these  regions,  we 
will  hopefully  find  at  least  a  graphical  verification  of  our  assumption. 

It  should  be  noted  here  that  as  we  extend  the  range  of  points  covered,  we  will  most  likely 
encounter  “less-useful”  (or,  at  least,  less  intuitively-appealing)  pdfs.  The  shaded  region  of  Figure 
1,  which  signifies  the  range  of  03-04  combinations  that  can  Le  modeled  using  the  GLD,  already 
contains  most  of  the  commonly-used  pdfs.  The  only  group  of  pdfs  not  fully  covered  are  those  that 
can  be  represented  by  the  U-  and  J-shaped  beta  distributions.  An  examination  of  the  GLD’s  pdf. 
Equation  2.1,  shows  that  as  A3  is  increased  above  one,  the  resulting  pdf  will  have  a  non-zero  (i.e. 
positive)  value  at  its  left  endpoint  (i.e.,  /(x)  >  0  at  the  lower  limit  of  the  distribution’s  range). 
Similarly,  as  A4  is  increased  beyond  one,  the  resulting  function  will  have  a  non-zero  value  at  its  right 
endpoint.  Since  most  of  the  U-  and  J-shaped  beta  distributions  pdf’s  exhibit  similar  behavior,  we 
suspect  that  by  extending  the  range  beyond  0  <  A3,  A4  <  1,  we  will  encounter  (>3-04  combinations 
characteristic  of  these  distributions. 
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A3  *  0.30 


i4 

LAH  1 

LAB  2 

LAB  3 

2.0 

-1.550 

.2660 

.0000 

2.2 

-1.164 

.2755 

.0380 

2.4 

-.871 

.2733 

.0695 

2.6 

-.642 

.2586 

.0911 

2.8 

-.474 

.2323 

.0983 

3.0 

-.362 

.1991 

.0925 

3.2 

-.288 

.1641 

.0796 

3.4 

-.239 

.1298 

.0640 

3.6 

-.204 

.0973 

.0481 

3*8 

179 

.0671 

.0330 

4,0 

•'.16  0 

.0389 

.0190 

4.2 

-.144 

.0127 

.6175+ 

4.3 

-.138 

.0789+ 

.0380+ 

4.4 

-.131 

-.0116 

-.5554+ 

4.5 

-.129 

-.0231 

-.0110 

4.6 

-.121 

-.0343 

-.0163 

4.8 

-.113 

-.0554 

-.0260 

5.0 

-.10  5 

-.0752 

-.0350 

5.2 

-.100 

-.0939 

-.0432 

5.4 

-.094 

-.1114 

-.0508 

5.6 

-.089 

-.1279 

-.0578 

5.8 

-.085 

-.1435 

-.0643 

6.0 

-.081 

-.1582 

-.0703 

6.2 

-.078 

-.1722 

-.0759 

6.4 

-.075 

-.1854 

-.0811 

6.6 

-.072 

-.1979 

-.0860 

6.8 

-.069 

-.2100 

-.0906 

7.0 

-.067 

-.2214 

-.0949 

7.2 

-.065 

-.2325 

-.0990 

7.4 

-.063 

-.2427 

-.1028 

7.6 

-.061 

-.2H8 

-.1064 

7.8 

-.060 

-.2623 

-.1098 

8.0 

-.058 

-.2716 

-.1131 

8.2 

-.056 

-.2805 

-.1162 

8.4 

-.055 

-.2889 

-.1191 

8.6 

-.054 

-.2971 

-.1219 

8.8 

-.053 

-.3050 

-.1246 

9.0 

-.052 

-.3125 

-.1271 

9.2 

-.051 

-.3197 

-.1295 

'l^ible  5.  Sample  Table  of  aa-a^  Combinations:  03  =  0.3, 2.0 


LAfl  4 


.7020 

.5556 

.4348 

.3324 

.2495 

.1859 

.1377 

.1003 

.0704 

.0460 

.0255 
.80  3  5+ 
.0489+ 
-.70  57+ 
-.0139 
-.0203 
-.0319 
-.0423 
-.0517 
-.0601 

-.06  78 
-.0748 
-.0812 
-.0872 
-.0927 
-.0977 
-.1025 
-.1069 
-.1111 
-.1149 

-.1186 

-.1220 

-.1253 

-.1284 

-.1313 

-.1341 

-.1367 

-.1392 

-.1416 

>  04  >  9.2 


5.3  A  Simplification  of  the  Equations  for  03  and  <>4 


A  nice  sidelight  to  our  hypothesis  is  tlial  when  A3  =  0,  the  equations  for  03  and  (Equations 
8  and  9)  can  be  written  in  a  much  simpler  form.  We  make  use  of  the  fact  that 


=  /?(a,  1)  = 


r(i)r(a) 

r(a+l) 


I 

a 


so  that  when  A3  =  0,  the  equations  for  the  intermediate  parameters  A,  B,C,  and  D  (given  by 
Equations  10  through  13)  can  be  reduced  to: 


A 


1  +  A4 
1  +  A4 


2X\ 

(1  +  A4)(1  +  2A4) 


C  =  1_3/?(1,1  +  A4)  +  3/?(1,1  +  2A4)- 
(1  +  A4)(1  +  2A4)(1  +  3A4) 


D  =  1-4/?(1,1  +  A4)  +  6/?(1,1  +  2A4)-4;0(1.1  +  3A4)  +  — 

1  +  4A4 

_ 24A^ _ 

(I  +  A4)(1  +  2A4)(1  +  3A4)(1  +  4A4) 


The  formulas  for  03  and  04  become: 


03 


2(1  +  A4)\/1  +  2A4 

I  +  3A4 
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2(1  -(■  2A4)(5  +  3X4  +  16A^  -f-  12A^) 

(1  +  3A4)(1  +  4A4) 


Note  that  the  equations  for  03  and  04  are  now  functions  of  a  single  variable,  A4.  Since  we  have  two 
equations  in  a  single  unknown,  we  should  be  able  to  solve  for  A4  when  given  a  specific  03  value.  If 
our  hypothesis  is  valid,  we  can  then  use  this  value  of  A4  to  find  the  minimum  possible  value  of  04 
for  the  given  value  of  03  and  we  can  therefore  establish  a  Icwer  limit  for  the  kurtosis. 

Because  of  the  symmetry  of  A3  and  A4  in  the  equations  for  A,  B,  C,  and  D,  we  also  found 
that,  for  the  case  where  A4  =  0,  the  equation  for  04  remains  the  same  (with,  of  course,  A3  replacing 
A4),  while  that  for  03  becomes  the  negative  of  the  A3  =  0  case,  i.e.: 


2(1  +  A3)'\/1  +  2A3 

~  iTsAT 


2(1  -f  2A3)(5  +  3A3  +  16A§  +  12Ag) 
(1  +  3A3)(1  +  4A3) 


5.4  Procedure 


A  FORTRAN  program  was  written  which  calculates  the  013-04  values  which  correspond  to 
a  specified  combination  of  A3  and  A4.  The  program  was  adapted  from  the  subroutine  in  the 
FORTRAN  code  of  Mykytka  and  Ramberg  [10]  that  accomplished  the  same  task.  This  program 
will  be  a  valuable  tool  for  subsequent  analysis,  since  we  will  be  able  to  restrict  our  attention  to 
only  valid  03-04  combinations.  Instead  of  choosing  a  combination  of  skewness  and  kurtosis  and 
running  the  original  FORTRAN  program  (as  Mykytka  [9]  did) — hoping  the  GLD  can  find  a  A3-A4 
combination  to  replicate  them — we  will  do  something  much  easier.  By  setting  the  values  for  A3 
and  A4  first,  we  know  that  the  resulting  03  and  04  values  must  be  within  the  GLD’s  range  (jis  long 
as  only  A3-A4  combinations  in  Regions  3  and  4  are  used).  By  checking  a  large  enough  subset  of 
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A3-A4  combinations,  we  hopefully  will  be  able  to  get  a  better  idea  of  the  range  of  possible 
combinations  that  can  be  modeled  using  the  GLD. 

5.5  Results 

We  originally  started  by  re-examining  the  range  0  <  A3,  A4  <  1.  Since  an  infinite  number  of 
different  A3-A4  combinations  are  possible  over  this  range,  we  obviously  can  only  look  at  a  small 
fraction  of  the  total  number  of  values.  Figure  6  shows  the  combinations  of  03  and  04  that  result 
when  A3  and  A4  are  each  iterated  over  this  range  in  steps  of  0.01  each.^  We  can  note  several 
interesting  details  from  Figure  6.  First  of  all,  the  03-04  combinations  are  symmetric  around  the 
03  =  0  axis.  This  is  expected,  since  Equations  8  and  9  show  that  when  the  values  of  A3  and  A4 
are  interchanged,  the  resulting  distribution  has  the  same  04  value,  but  has  a  03  value  that  is  the 
negative  of  the  original.  Therefore,  we  should  expect  this  type  of  symmetric  behavior. 

Figure  6  also  shows  that  there  is  an  apparent  lower  limit  on  the  possible  04  values,  but  that 
the  minimum  possible  04  value  for  any  particular  case  depends  on  the  associated  value  of  03.  For 
example,  we  can  see  that  a  minimum  04  value  of  approximately  1.8  is  observed,  but  only  when 
0:3  =  0.  When  03  =  1,  the  minimum  observed  value  of  04  is  only  3.5. 

Figure  7  shows  the  03-04  combinations  that  result  when  we  set  A3  =  0  and  iterate  A4  over 
the  range  in  0.01  intervals.  Comparing  Figures  6  and  7  shows  that  the  minimum  04  value  for  a 
given  value  of  03  (03  >  0)  does  indeed  occur  when  A3  =  0.  Figure  8  shows  the  results  when  we  set 
A4  =  0  and  vary  the  value  of  A3  over  the  same  range.  Obvioudly,  when  03  <  0,  the  minimum  value 
of  04  (for  a  given  value  of  03)  occurs  when  A4  =  0,  as  we  would  expect.  The  two  figures  also  show 
that  when  03  =  0,  setting  either  A3  or  A4  to  zero  results  in  thelsame  minimum  value  of  04. 

So,  over  our  limited  range  of  values,  it  seems  that  our  hypothesis  is  valid.  However,  what 
happens  when  we  iterate  over  the  same  range  in  intervals  smedler  than  0.01?  Does  our  hypothesis 

^We  disregard  the  combination  A3  =  A4  =  0  for  this  case,  as  well  as  all  future  cases,  since  it  yields  a  GLD  with 
an  invalid  pdf. 
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Figure  6.  03-04  Combinations  for  0  <  Aj,  A4. <  1 

still  hold?  To  answer  this  question,  we  studied  cases  where  the  step  size  was  set  as  low  as 
0.0001.  These  additional  points,  when  added  to  Figure  6,  merely  “filled  in”  the  range  above  our 
previous  limits.  Setting  A3  (or  A4,  depending  on  the  range  of  interest)  equal  to  zero  still  gave  the 
smallest  04  value  for  a  given  value  of  03. 

Our  next  step  was  to  attempt  to  expand  our  hypothesis  to  regions  beyond  the  range  0  < 
A3,  A4  <  1.  Although  we  could  not  inspect  every  possible  A3-A4  combination,  we  tried  to  examine  * 
typical  cases  over  the  entire  range  of  possible  values. 


/  *' 
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Figure  7.  03-04  Combinations  for  A3  =  0, 0  <  A^  <  1 

To  do  this,  we  first  needed  to  determine  maximum  values  for  A3  and  A4.  The  lower  bound 
on  A3  and  A4  had  already  been  given  by  Equation  5,  mjn(A3,A4)  >  —5.  However,  there  were 
no  restrictions  on  the  largest  possible  values  these  parameters  could  have.  Through  trial  and 
error,  a  working  upper  bound  of  A3,A4  <  40  was  proposed.  Although  specific  combinations  of  A3 
and  A4  exist  where  one  of  the  parameters  has  a  value  greater  than  40,  we  wished  to  study  those 
combinations  where  we  can  vary  both  variables  over  their  entire  ranges  without  encountering  any 
undefined  03-04  combinations.  We  do  not  expect  that  we  are  dismissing  a  large  number  of  A3-A4 
combinations  by  using  this  assumption,  and  it  gives  us  an  upper  bound  with  which  to  work. 

Figure  9  shows  the  resulting  03-04  combinations  when  A3  and  A4  were  varied  over  the  range 
0  <  A3,  A4  <  40  using  a  step  size  of  1.0.  Due  to  limitations  in  the  capacity  of  our  plotting  program, 
this  larger  step  size  was  needed  to  plot  this  entire  range.  We  expect  that  as  we  saw  before,  as 
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Figure  8.  «3-a4  Combinations  for  0  <  A3  <  1,  A4  =  0 

the  step  size  is  decreased  (thereby  increasing  the  total  number  of  A3-A4  combinations  plotted), 
the  new  points  will  be  located  in  positions  that  will  cause  the  open  area  above  the  parabolic  curve 
to  be  “filled  in.” 

Figures  10  and  11  show  the  resulting  03-0^  combinations  when  we  set  one  of  the  parameters 
equal  to  zero  and  varied  the  other  over  the  range  from  zero  to  40,  using  a  step  size  of  0,01.  Figure 
10  set  A3  =  0  and  varied  A4  over  this  range,  while  Figure  11  set  A4  =  0  and  varied  A3. 

By  comparing  Figures  10  and  11  to  the  complete  set  of  points  in  Figure  9,  we  can  see  that 
these  two  subsets  of  values  do  indeed  encompass  the  minimum  04  values  for  all  of  the  03  values. 
Figure  12  shows  the  results  of  Figures  10  and  11  together  on  the  same  set  of  axes. 

By  examining  these  two  subsets  of  values  in  Figure  12  together,  it  is  obvious  that  over  a^’s 
negative  range,  setting  A3  =  0  produces  the  corresponding  minimum  04  value,  while  over  03 ’s 


positive  range,  setting  A4  =  0  produces  the  minimum  04  value — the  “opposite”  of  our  results  from 
the  earlier  case,  where  A3  and  A4  were  restricted  to  values  between  zero  and  one. 

Figure  13  shows  the  area  where  the  two  sets  of  points  cross  in  more  detail.  Since  Figure  12 
includes  a  subset  of  the  range  of  values  shown  in  Figures  7  and  8,  we  can  see  where  these  earlier 
points  lie  in  our  extended  range.  Obviously,  the  points  in  Figure  7  correspond  to  the  A3  =  0  points 
found  above  the  A4  =  0  curve  in  Figure  12  (in  the  range  where  03  >  0).  Similarly,  the  points  found 
in  Figure  8  correspond  to  the  A4  =  0  points  found  above  the  A3  =  0  points  in  Figure  12  (03  <  0). 

An  examination  of  the  A3-A4  combinations  used  to  generate  Figure  13  shows  that  these“uppcr” 
points  correspond  to  the  cases  where  the  non-zero  parameter  had  a  value  less  than  1.0.  The  two 
points  shown  at  03  =  0  in  Figure  12  correspond  to  the  pairs  A3  =  0,  A4  =  1  and  A3  =  1,A4  0.  As 

the  non-zero  parameter  is  increased  above  1.0,  the  “lower”  part  of  the  respective  curve  is  formed. 
As  it  is  decreased  below  1.0,  the  “upper”  part  of  the  curve  is  generated. 

It  therefore  appears  that  the  sets  of  values  given  in  Figures  7  and  8  are  not  the  minimum 
04  values  that  are  possible  using  the  GLD.  By  setting  either  A3  or  A4  equal  to  zero  (depending  on 
whether  a  negative  or  positive  value  of  03,  respectively,  is  desired)  and  allowing  the  other  to  take 
on  values  above  one,  we  can  attain  smaller  values  of  04  for  the  same  values  of  03. 

To  this  point  in  our  analysis,  we  have  only  examined  A3-A4  combinations  in  Region  3.  How¬ 
ever,  there  are  also  valid  A3-A4  combinations  in  Region  4.  What  values  of  03  and  04  do  these 
combinations  yield? 

Figure  14  shows  the  03-04  combinations  that  result  when  A3  and  A4  are  varied  over  the  range 
—  .25  <  A3,A4  <  0  using  a  step  size  of  0.01.  We  can  see  that  althougli  the  range  of  03  is  similar 
to  the  previous  C2ises,  the  resulting  values  of  04  are  much  larger.  Figure  15  shows  only  the  lower 
portion  of  this  range,  using  a  smaller  step  size  (thereby  showing  more  points).  Obviously,  none  of 
these  A3-A4  combinations  yield  a  04  value  lower  than  those  we  have  already  found,  but  instead  “fill 
in”  the  upper  range  of  03-04  combinations  not  covered  by  the  A3-A4  combinations  of  Region  4. 
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5.6  Conclusions 

Our  original  hypothesis  weis  that  for  a  given  value  of  03,  the  minimum  possible  value  of  04 
that  the  GLD  can  produce  occurs  when  A3  =  0.  When  we  examined  the  range  0  <  A3,A4  <  1, 
it  appeared  that  this  hypothesis  was  at  least  partly  true.  However,  when  we  examined  A3-A4 
combinations  outside  of  this  range,  we  noticed  that  a  lower  value  of  04  could  be  found  for  the  same 
03- 

We  must  therefore  revise  our  original  hypothesis.  A  more  reasonable  one  seems  to  be  that  for 
a  given  non-negative  value  of  03,  the  minimum  04  value  that  the  GLD  can  produce  occurs  when 
A4  =  0  and  A3  >  1.  If  as  is  negative,  the  minimum  04  value  occurs  when  A3  =  0  and  A4  >  1. 

If  we  look  at  the  sample  table  given  in  Figure  5,  we  can  most  likely  “improve”  on  its  smallest 
Qf4  value.  We  expect  that  as  we  decrease  the  value  of  04  below  the  smallest  tabulated  value,  the 
corresponding  A3  values  should  decrease  to  zero,  and  then  rise  again  to  values  above;  1.0  at  its 
true  minimum.  The  corresponding  values  of  A4  should  decreases  to  zero,  and  become  zero  for  the 
minimum  possible  04.  This  will  definitely  increase  the  GLD’s  present  range. 

Although  we  have  only  looked  at  only  a  small  subset  of  the  A3-A4  combinations,  we  can  be 
confident  of  our  results.  It  was  mentioned  previously  that  we  needed  to  use  a  fairly  large  step  size 
in  some  of  our  analysis.  When  this  step  size  was  decreased,  we  did  not  witness  any  unexpected 
behavior.  Instead,  these  additional  point  simply  fill  in  the  03-04  space  above  our  minimum  04 
curve.  If  we  were  to  use  an  infinitesimally  small  increment  between  A3  and  A4  values,  we  expect  to 
see  a  solid  region  above  the  minimum  04  line,  as  shown  by  Figure  16. 

As  a  final  result.  Figure  17  shows  both  the  original  coverage  region  of  the  GLD  (from  Figure 
1)  and  our  newly-covered  03-04  combinations  (denoted  by  the  darker  shaded  region).  Although 
we  can  not  reach  the  boundary  of  the  Impossible  Area,  we  can  indeed  cover  a  larger  portion. 
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Figure  17.  Expanded  Coverage  Range  of  the  GLD 
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Table  6.  Four  Uncovered  oa-a/i  Combinations 

type  of  shapes  we  can  expect  from  other  pdfs  that  fall  into  these  regions.  Figures  18  through  21 
show  the  four  resulting  pdfs.  Note  that  these  four  pdfs  seem  to  be  “extreme”  examples  of  beta 
distributions — interesting,  but  perhaps  not  very  useful. 


Figure  18.  Distribution  Function  for  Case  1 


To  see  this  point,  recall  that  one  of  our  reasons  for  wanting  to  expand  the  GLD’s  range  was 
to  make  it  more  useful  as  a  simulation  tool.  By  expanding  its  range,  we  can  model  a  wider  range 
of  03-04  combinations,  and  hence  a  wider  range  of  potential  empirical  data  sets.  However,  we 
must  also  look  at  the  situation  realistically.  In  ordinary  simulations,  we  do  not  expect  to  see  many 
empirical  data  sets  that  resemble  the  pdfs  shown  in  Figures  18  through  21. 


Figure  21.  Distribution  Function  for  Case  4 

Is  it  worthwhile  to  attempt  to  expand  the  GLD  to  cover  the  remainder  of  these  03-0^  com¬ 
binations?  The  answer  tc  that  question  seems  like  it  would  depend  on  whom  you  asked.  While 
mathematicians  might  be  dismayed  that  we  can  not  cover  all  the  possible  situations,  simulation 
users  might  be  satisfied  that  they  can  model  such  a  wide  range  of  different  distributions  using  a 
single  pdf.  We  tend  to  fall  into  the  latter  group.  It  is  slightly  disappointing  that  we  have  not  been 
able  to  expand  the  GLD  to  the  boundary  of  the  Impossible  Region  of  Figure  17,  but  we  are  happy 
with  what  we  have  done.  The  rcinge  of  distributions  we  cannot  model  using  the  GLD  seems  to  be 
“extreme”  cases  that  will  not  be  of  much  practical  use.  Therefore,  our  efforts  can  be  considered, 
at  least,  a  partial  success. 


57 


VII.  L-Moments  for  the  GLD 


7.1  Introduction 

In  determining  the  appropriate  values  for  the  four  GLD  parameters,  the  method  of  moments  is 
commonly  used.  There  are  several  reasons  for  this.  First,  the  concept  of  the  moments  is  something 
that  can  be  understood  by  the  majority  of  users.  Second,  the  first  four  moments  (mean,  variance, 
skewness,  and  kurtosis,  respectively)  of  a  pdf  or  empirical  data  set  can  usually  be  calculated  fairly 
easily.  Third,  since  P.amberg  and  Schmeiser  [12]  developed  equations  for  the  first  four  moments  as 
functions  of  the  four  GLD  parameters,  we  can  “match”  most  any  distribution  or  data  set  using  the 
GLD  by  simply  determining  appropriate  values  for  the  four  A  parameters. 

Unfortunately,  there  are  some  problems  with  this  approach.  As  we  have  already  mentioned, 
the  GLD  cannot  “mimic”  all  possible  combinations  of  skewness  and  kurtosis.  Although  it  can 
represent  most  of  the  commonly-used  pdfs,  there  are  some  it  annot.  Secondly,  in  empirical  data 
sets,  there  can  be  a  large  variability  in  the  higher-order  moments,  i.e.  the  skewness  and  kurtosis. 
Since  these  moments  are  based  on  the  third  and  fourth  powers  of  the  difference  between  each 
sample  point  and  the  mean,  one  abnormal  data  point  can  create  a  large  change  in  these  measures, 
especially  when  the  sample  data  set  is  relatively  small.  It  may  well  be  the  case  that  this  outlying 
value  is  important  to  the  overall  nature  of  the  underlying  distribution,  but  often  such  occurrences 
are  simply  bad  luck  in  sampling. 

Since  the  GLD  relies  on  the  given  values  of  skewness  and  kurtosis  so  heavily,  the  resulting 
distribution  can  be  significantly  altered  if  incorrect  values  are  used.  It  therefore  may  be  worthwhile 
to  investigate  other  methods  for  obtaining  the  values  of  the  GLD  parameters.  Hosking  [3]  has 
presented  an  alternative  to  the  use  of  moments,  that  of  L-moments.  These  L-moments  are  based 
on  order  statistics  and  are  supposedly  not  as  susceptible  to  abnormal  data  points  as  are  the  measures 
of  skewness  and  kurtosis. 
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1.2  Derivation 


Hosking  defines  the  first  four  L-moments  via: 


Ai 


A2 


A3 


A4 


f  R{p)(ip 

Jo 

[  Rip)  ■  (2p  -  l)dp 
Jo 

f  R(p)  ■  (Gp^  -  dp  +  l)(lp 
Jo 

I  Rip)  ■  (20p^  -  30;r  +  12p  +  l)rfp 
Jo 


(20) 

(21) 

(22) 

(23) 


where  R(p)  is  simply  the  percentile  function. 

As  with  the  commonly-used  measures  of  skewness  and  kurtosis,  Hosking  [3]  chooses  to  define 
the  two  higher  order  moments  as  dimensionless  ratios  relative  to  the  second  order  moment: 


T3 


_  Aa 
Aa 


We  opt  to  follow  that  convention. 

The  four  L-moments  have  similar  roles  to  the  typical  moments.  Since  the  first  L-moment 
is  simply  the  expected  value  of  the  distribution,  it  is  identical  to  the  mean  of  the  dist-ibution. 
Also,  a  symmetric  distribution  will  have  T3  =  0,  just  as  03  =  0  for  symmetric  distributions  using 
conventional  moments.  According  to  Hosking,  however,  T3  and  T4  are  more  stable  measures  than 
03  and  04,  and  therefore  better  estimates  for  empirical  data. 


For  the  GLD,  the  L-moment  equations  become: 


Ai  = 


•^1  + 


A4~A.-s 

A3(A3+1)(A4+1) 


(24) 
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A2{A3+n(A5+l»HA4+l)(A4+-J) 


(25) 


A2  = 


_  (A)-X,)(A<-fl)('A<-(-21(A<+3)-(Al-A<)(Aj+H(A.,+-.;)(A,+:!) 

^  ( A3+3)f  A4+3j[A3(  A4  +  I )( A* +2 )+A4(A3+1)(Ai+2)]  ^ 


(26) 


(A;-3A;  +  2A.,)(Avfn(A<4-21(A4+3VA,.f-t)4-(A)-3A;+2A<)fATt-n(A)+2)(A3+3)(A,+  D 

(A3+3)(A3+4)(A4+3j(A4+4)[A3(A4+l)(A3+2)+A4(A3  +  ! )( A3+2)]  ' 


(27) 


A  complete  derivation  of  these  equations  can  be  found  in  appendix  A. 

1.3  V St  fulness  of  L- Moments 

Although  Equations  24  through  27  look  complicated,  note  that  they  are  all  polynomials.  • 
Unlike  the  GI,D  equations  (Equations  6  through  9)  based  on  typical  moments,  the.se  have  no  beta 
functions.  Since  computerized  solution  algorithms  for  systems  of  polynomial  equations  are  fairly 
common  and  easily  adaptable,  it  may  be  easier  to  determine  the  values  of  the  GLD  parameters 
using  L-moments  instead.  We  have  already  noted  that  according  to  Ilosking  [3],  73  and  arc  more 
“stable”  measures  than  the  currently  used  moments,  03  and  a^.  Perhaps  the  method  of  inoments 
could  be  implemented  with  L-moments  rather  than  the  more- familiar  typical  (standard)  moments 


l.f  (The  Problem  of)  Computing  L-moments  _ 

There  are  some  problems  with  L-moments  as  well.  The  “sample”  L-mornents  are  based  on 
order  statistics; 

A,  =  E{X) 

A2  =  ^£'(A-2  2  -  X12) 

Aj  =  ^E(^3:3  — ‘^X23  +  Xis) 
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where  A'„  j  denotes  the  a‘^  order  statistic  in  a  group  of  size  b.  A  review  of  the  available  litera¬ 
ture  does  not  show  liow  to  adapt  these  measures  to  samples  of  a  larger  size.  Therefore,  we  have 
hypothesized  two  ways  of  doing  this. 

First,  we  could  define  each  measure  as  the  average  of  the  results  of  every  possible  subset  of 
the  proper  size  within  the  data  set.  For  c.xample,  in  calculating  the  value  A2  for  a  sample  of  size 

/io\ 

n  =  10,  we  would  calculate  the  value  of  An  for  all  |  2  1  =  *15  possible  combinations  and  average 
the  results.  This  is  obviously  an  unacceptable  option  since  the  amount  of  work  required  quickly 
becomes  prohibitive,  even  for  relatively  small  data  sets. 

j 

A  second  possible  method  would  be  to  derive  the  empirical  cumulative  distribution  function 
of  the  data  set,  and  create  finite  summations  to  approximate  Equations  21  through  23.  This  seems 
to  be  the  more  rea.sonable  approach,  since  the  amount  of  work  required  is  much  less  than  that 
involved  in  calculating  each  possible  combination.  If  an  effective  means  of  calculating  sample  L- 
momentsican  be  found,  we  can  use  Equations  24  through  27  in  place  of  the  more  complicated  GLD 
moment  lequations  to  estimate  appropriate  values  for  the  Aj.  Relative  computational  ease  may 
simplify  further  research  into  the  GLD. 

Unfortunately,  the  quality  of  these  estimators  remains  to  be  seen.  At  this  point,  we  have 
no  way  of  knowing  if  either  method  will  produce  “good”  estimates  (unbiased,  minimum  variance, 
etc.).  To  test  the  consistency  of  these  approaches,  it  might  be  reasonable  to  conduct  test  rases  with 
known  pdfs  to  compute: 

1.  The  theoretical  L-moments— Hosking  [4]  presents  these  for  many  of  the  commonly-used  pdfs. 

2.  An  exhaustive  enumeration — take  samples  of  various  sizes  from  the  pdfs  and  compute. 

3.  The  empirical  cdf  of  those  same  samples. 


and  compare  the  results. 


17//.  Conclusions/Suggcstions  for  Future  Research 


This  thesis  was  undertaken  to  study  the  Generalized  Lambda  Distribution  in  depth,  focusing 
particularly  on  the  reasons  behind  the  limitation  on  the  range  of  skewness-kurtosis  combinations 
it  can  assume.  Our  goal  was  to  e.xpand  this  range  so  that  the  GLD  would  be  capable  of  modeling 
any  possible  probability  den.sity  function  or  empirical  data  set.  Although  we  were  able  to  expand 
the  GLD’s  range  beyond  its  previous  limits,  we  fell  short  of  our  goal  of  total  coverage.. 

What  is  the  problem?  It  could  be  a  number  of  things.  Other  general-purpose  methods  for 
fitting  distributions  to  data  (such  as  the  Johnson  and  Pearson  systems)  must  utilize  more  than 
one  functional  form  for  the  distribution’s  pdf  to  cover  the  entire  range  of  values  shown  in  Figure 
1.  Perhaps  this  is  the  case  for  the  GLD  as  well.  We  also  could  be  faced  with  a  theoretical  limit  on 
possible  combinations  of  03  and  0-4  that  can  be  modeled  using  the  GLD  in  its  present  form. 

Further,  since  we  use  the  method  of  moments  to  determine  the  appropriate  values  of  the 
four  GLD  parameters,  we  limit  their  range  to  regions  where  the  first  four  moments  are  defined. 
This  thesis  has  given  a  description  of  an  alternative  to  the  method  of  moments  (L-moments),  and 
other  methods  for  determining  the  parameter  values  have  been  documented.  Perhaps  one  of  these 
approaches  may  yield  new  GLD  forms  that  the  method  of  moments  cannot. 

The  limit  may  be  the  result  of  analytical  considerations,  as  well.  Due  to  the  complexity  of 
the  GLD,  we  can  not  solve  for  the  parameter  values  in  an  easy  fashion;  instead  we  must  rely  on 
computerized  searches  to  determine  appropriate  lambda  values.  The  equations  for  03  and  04,  (8) 
and  (9),  both  require  evaluation  of  beta  functions.  The  beta  functions  in  turn  require  evaluation 
of  gamma  functions.  Since  finding  exact  values  for  the  gamma  function  is  not  computationally 
tractable,  we  are  instead  forced  to  use  approximation  techniques.  These  approximations  might  be 
a  source  of  error,  but  we  do  not  believe  so.  However,  if  it  is,  hopefully  the  L-moment  method, 
which  uses  only  polynomials  to  determine  the  parameter  values,  will  yield  better  results. 
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Over  the  course  of  our  research,  we  have  learned  a  great  deal  more  about  the  GLD.  We  have 
been  able,  at  least  experimentally,  to  establish  a  lower  limit  on  the  values  of  kurtosis  that  the  GLD 
can  attain,  given  a  specific  skewness.  We  also  have  a  more  thorough  understanding  of  how  Powell’s 
Algorithm  works  and  how  it  functions  when  faced  with  our  penalty  function.  We  saw  that  neither 
the  penalty  function,  nor  the  constrained  region  of  viable  A3-A4  values,  had  an  effect  on  the  the 
algorithm’s  searching  process. 

Where  do  we  go  from  here?  As  a  first  step,  the  concept  of  L-moments  is  worth  a  longer 
look.  At  present,  the  literature  on  L-moments  is  limited.  In  particular,  we  do  not  know  how  to 
effectively  determine  the  L-moments  of  an  empirical  data  set.  We  discussed  two  of  our  own  ideas 
for  doing  this  in  Chapter  VII.  Using  L-moments,  we  can  find  the  values  of  the  four  GLD  parameters 
simply  by  solving  a  system  of  polynomials  rather  than  using  a  non-linear  function  minimization 
required  to  match  the  standard  moments.  Hopefully  their  use  can  expand  the  GLD’s  range  even 
further.  L  -moments  are  not  widely  known,  however,  and  we  need  more  information  about  them  to 
determine  their  usefulness. 

In  summary,  though,  we  must  remember  that  even  with  its  present  restrictions,  the  GLD  is 
still  an  extremely  powerful  tool  for  simulation  studies.  It  allows  the  user  to  model  a  wide  range 
of  pdfs  and  empirical  data  sets,  simply  using  the  GLD  (with  appropriate  parameter  values)  and  a 
pseudo-random  uniform  random  variable  generator.  When  modeling  an  empirical  data  set,  instead 
of  facing  a  tough  decision  between  two  (or  several)  competing  pdfs,  we  can  usually  match  the  first 
four  moments  of  the  data  set  exactly  using  the  GLD— a  much  easier  choice. 
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We  now  make  use  of  the  fact: 
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This  is  equivalent  to: 
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We  use  Equations  36  and  41  to  derive  the  equation  for  -3: 
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We  use  Equations  34,  37  and  the  fact  that; 
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(43) 


Combining  the  first  four  terms  of  Equation  43  yields: 

h  [20(A3  +  1)(A3  +  2)(A3  +  3)  -  30{A3  +  1)(A3  +  2){A3  +  4)] 
+  ^  [12(A3  +  1)(A3  +  3)(A3  +  4)  -  (A3  +  2)(A3  +  3)(A3  +  4)] 


where  D  represents  the  common  denominator; 


D  =  (A3  +  1)(A3  +  2)(A3  +  3)(A3  +  4). 


This  is  equivalent  to; 
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Combining  the  last  four  terms  of  Equation  43  yields: 
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Taking  Equations  *14  and  45  together  yields: 
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Using  Equations  36  and  46,  we  find: 
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